The probability function, mean and variance are given in the adjacent table.
An alternative expression of the distribution has both the number of balls taken of each color and the number of balls not taken as random variables, whereby the expression for the probability becomes symmetric.
Better approximations to the mean and variance are given by Levin (1984, 1990), McCullagh and Nelder (1989), Liao (1992), and Eisinga and Pelzer (2011). The saddlepoint methods to approximate the mean and the variance suggested Eisinga and Pelzer (2011) offer extremely accurate results.
Derivation
The univariate noncentral hypergeometric distribution may be derived alternatively as a conditional distribution in the context of two binomially distributed random variables, for example when considering the response to a particular treatment in two different groups of patients participating in a clinical trial. An important application of the noncentral hypergeometric distribution in this context is the computation of exact confidence intervals for the odds ratio comparing treatment response between the two groups.
Suppose X and Y are binomially distributed random variables counting the number of responders in two corresponding groups of size mX and mY respectively,
- .
Their odds ratio is given as
- .
The responder prevalence is fully defined in terms of the odds , , which correspond to the sampling bias in the urn scheme above, i.e.
- .
The trial can be summarized and analyzed in terms of the following contingency table.
More information Treatment Group, responder ...
Treatment Group | responder | non-responder | Total |
X |
x | . | mX |
Y |
y | . | mY |
Total |
n | . | N |
Close
In the table, corresponds to the total number of responders across groups, and N to the total number of patients recruited into the trial. The dots denote corresponding frequency counts of no further relevance.
The sampling distribution of responders in group X conditional upon the trial outcome and prevalences,
,
is noncentral hypergeometric:
Note that the denominator is essentially just the numerator, summed over all events of the joint sample space for which it holds that . Terms independent of X can be factored out of the sum and cancel out with the numerator.