Hypergeometric Distribution A Distribution of Dependent Events #175 Data Sc. and A.I. Lect. Series

 

 

Hypergeometric Distribution : A Distribution of Dependent Events

By Bindeshwar Singh Kushwaha

PostNetwork Academy

Introduction
  • In the previous sections, we studied distributions such as the binomial distribution.
  • The binomial distribution assumes that each trial is independent and the probability of success remains constant.
  • However, in many real-life problems, selections are made without replacement.
  • In such cases, the trials are not independent, and the hypergeometric distribution is used.
Real-life Scenario
  • Suppose we have 10 tickets numbered 1 to 10.
  • We draw 3 tickets without replacement.
  • Let “success” be the event of drawing an odd-numbered ticket.
  • Total odd numbers = 5 (1, 3, 5, 7, 9).
  • If we replace the ticket each time, the trials are independent and we use the Binomial Distribution.
  • If we do not replace the tickets, the trials are dependent and we use the Hypergeometric Distribution.
Basic Definition
  • The hypergeometric distribution gives the probability of obtaining exactly \(x\) successes in \(n\) draws,
  • From a finite population of size \(N\) containing \(M\) successes and \((N – M)\) failures,
  • When sampling is done without replacement.
\[
P(X = x) = \frac{\binom{M}{x}\binom{N – M}{n – x}}{\binom{N}{n}}
\]
Parameters of Hypergeometric Distribution
  • \(N\) = Total number of items in the population
  • \(M\) = Number of success items in the population
  • \(n\) = Number of draws (sample size)
  • \(x\) = Number of observed successes in the sample
Mean and Variance
  • The mean or expected value is given by
\[
E(X) = n \frac{M}{N}
\]
  • The variance is
\[
Var(X) = n \frac{M}{N} \frac{N – M}{N} \frac{N – n}{N – 1}
\]

The term \(\frac{N – n}{N – 1}\) is called the finite population correction factor.

Example 1: Ticket Problem
  • 10 tickets are numbered from 1 to 10.
  • 5 tickets have odd numbers (successes) and 5 have even numbers (failures).
  • 3 tickets are drawn without replacement.
  • Find the probability that exactly 2 tickets have odd numbers.
\[
P(X=2) = \frac{\binom{5}{2}\binom{5}{1}}{\binom{10}{3}} = \frac{10 \times 5}{120} = \frac{1}{2.4} \approx 0.4167
\]
Example 2: Jury Selection Problem
  • A jury of 12 members is chosen from a pool of 20 people.
  • 8 are men (successes) and 12 are women (failures).
  • Find the probability that the jury contains exactly 5 men.
\[
P(X=5) = \frac{\binom{8}{5}\binom{12}{7}}{\binom{20}{12}}
\]
Example 3: Fish Tank Problem
  • A tank contains 200 fish. 60 are tagged and 140 are untagged.
  • A sample of 10 fish is drawn without replacement.
  • Find the probability that exactly 4 tagged fish are drawn.
\[
P(X=4) = \frac{\binom{60}{4}\binom{140}{6}}{\binom{200}{10}}
\]
Example 4: Probability of Acceptance Without Further Inspection
  • A lot of 25 units contains 10 defective units.
  • An engineer inspects 2 randomly selected units from the lot.
  • The lot is accepted if both selected units are non-defective.
  • We are to find the probability that the lot is accepted without further inspection.
\[
P(\text{no defective units in sample}) = \frac{\binom{15}{2}}{\binom{25}{2}}
\]
\[
\frac{15 \times 14}{25 \times 24} = \frac{210}{600} = 0.35
\]

Answer: The probability that the lot is accepted without further inspection is \(\boxed{0.35}\).

Important Properties
  • Trials are dependent since sampling is without replacement.
  • The total population \(N\) is finite.
  • The random variable \(X\) can take integer values within
\[
\max(0, n – (N – M)) \le X \le \min(n, M)
\]

The sum of all probabilities is 1:

\[
\sum_{x} P(X=x) = 1
\]
Comparison with Binomial Distribution
  • The Binomial Distribution assumes independent trials with constant probability \(p\).
  • The Hypergeometric Distribution assumes dependent trials without replacement.
  • When \(N\) is large and \(n\) is small, the hypergeometric distribution approximates the binomial distribution.

PDF

hypergeometricdistribution

Video

Summary
  • Used when sampling is done without replacement.
  • Suitable for finite populations.
  • Depends on \(N\), \(M\), and \(n\).
  • Probability formula:
\[
P(X = x) = \frac{\binom{M}{x}\binom{N – M}{n – x}}{\binom{N}{n}}
\]

Mean and variance summarize the distribution’s behavior.

Reach PostNetwork Academy
  • Website: www.postnetwork.co
  • YouTube Channel: www.youtube.com/@postnetworkacademy
  • Facebook Page: www.facebook.com/postnetworkacademy
  • LinkedIn Page: www.linkedin.com/company/postnetworkacademy
  • GitHub Repositories: www.github.com/postnetworkacademy

Thank You!

 

©Postnetwork-All rights reserved.