Fitting of Poisson Distribution
Bindeshwar Singh Kushwaha — PostNetwork Academy
Introduction
Master the technique of fitting the Poisson distribution to real-world frequency data.
This tutorial shows a step-by-step method to calculate theoretical frequencies for observed datasets.
Key Concepts & Techniques
- Introduction to Fitting: Fit a theoretical Poisson distribution to experimental data to derive expected frequencies.
- The Recurrence Advantage: Use the recurrence relation for Poisson probabilities to compute successive probabilities easily.
- Calculating Expected Frequencies: Determine theoretical frequency \( f(x) \) from the total number of observations \( N \).
- The Fitting Procedure:
- Calculate the mean \( \lambda \) from observed data.
- Find the initial probability \( p(0) \).
- Apply the recurrence formula to get \( p(1), p(2), \dots \).
- Multiply probabilities by \( N \) to obtain theoretical frequencies.
Recurrence Formula for the Poisson Probabilities
For a Poisson distribution with parameter \( \lambda \):
\( p(x) = \dfrac{e^{-\lambda}\,\lambda^x}{x!} \) … (1)
If we change \( x \) to \( x+1 \):
\( p(x+1) = \dfrac{e^{-\lambda}\,\lambda^{x+1}}{(x+1)!} \) … (2)
Divide (2) by (1):
\( \dfrac{p(x+1)}{p(x)} = \dfrac{\lambda}{x+1} \)
So the recurrence relation is:
\( p(x+1) = \dfrac{\lambda}{x+1}\,p(x) \) … (3)
Using the Recurrence Relation
- This recurrence relation holds for the Poisson probabilities.
- Start with \( p(0) = e^{-\lambda} \), then compute \( p(1), p(2), \dots \) successively using (3).
Poisson Frequency Distribution
If an experiment follows Poisson assumptions and is repeated \( N \) times, the expected frequency of observing \( x \) occurrences is
\( f(x) = N \cdot P(X=x) = N \cdot \dfrac{e^{-\lambda}\,\lambda^x}{x!}, \quad x = 0,1,2,\dots \)
Example 1 — Defective Bottles
A manufacturer: 0.1% bottles are defective. Boxes contain 500 bottles. A buyer purchases 100 boxes.
Find how many boxes will contain at least two defective bottles.
Step 1: Parameters
- Probability defective: \( p = \dfrac{0.1}{100} = 0.001 \).
- Box size \( n = 500 \Rightarrow \lambda = n p = 500 \times 0.001 = 0.5 \).
- Number of boxes \( N = 100 \).
- Poisson PMF: \( P(X=x) = \dfrac{e^{-0.5}(0.5)^x}{x!} \).
Step 2: Probability \( X \ge 2 \)
\( P(X \ge 2) = 1 – [P(X=0) + P(X=1)] \)
\( = 1 – \left[ \dfrac{e^{-0.5}(0.5)^0}{0!} + \dfrac{e^{-0.5}(0.5)^1}{1!} \right] \)
\( = 1 – e^{-0.5}(1 + 0.5) \)
Numerical: \( e^{-0.5} \approx 0.60653 \Rightarrow P(X \ge 2) \approx 1 – 0.60653\times1.5 = 0.090205 \)
Step 3: Expected Number of Boxes
Expected = \( N \times P(X\ge2) = 100 \times 0.090205 \approx 9.02 \)
So about 9 boxes are expected to contain at least 2 defective bottles.
Process of Fitting a Poisson Distribution (Summary)
- Compute mean \( \bar{x} = \dfrac{\sum f x}{\sum f} \) and use it as \( \lambda \).
- Compute \( p(0) = e^{-\lambda} \).
- Use recurrence \( p(x+1) = \dfrac{\lambda}{x+1} p(x) \) to find additional probabilities.
- Compute theoretical frequencies \( f(x) = N \cdot p(x) \).
Example 2 — Aircraft Accidents (Fitting Example)
Data for 2480 pilots (number of accidents):
| Number of Accidents (X) | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| Observed frequency (f) | 1970 | 422 | 71 | 13 | 3 | 1 |
Step 1: Mean \( \lambda \)
\( N = \sum f = 1970 + 422 + 71 + 13 + 3 + 1 = 2480 \).
\( \sum fX = 0\cdot1970 + 1\cdot422 + 2\cdot71 + 3\cdot13 + 4\cdot3 + 5\cdot1 = 620 \).
\( \lambda = \dfrac{620}{2480} = 0.25 \).
Step 2: \( p(0) \)
\( p(0) = e^{-0.25} \approx 0.7788008 \) (rounded to 0.7788)
Step 3: Probabilities by recurrence (λ = 0.25)
- \( p(1) = p(0) \times \dfrac{0.25}{1} \approx 0.7788 \times 0.25 = 0.1947 \)
- \( p(2) = p(1) \times \dfrac{0.25}{2} \approx 0.1947 \times 0.125 = 0.02434 \)
- \( p(3) \approx 0.02434 \times \dfrac{0.25}{3} \approx 0.00203 \)
- \( p(4) \approx 0.00203 \times \dfrac{0.25}{4} \approx 0.000127 \)
- \( p(5) \approx 0.000127 \times \dfrac{0.25}{5} \approx 6.35\times10^{-6} \)
Step 4: Theoretical frequencies \( f(x) = 2480 \times p(x) \)
- \( f(0) \approx 2480 \times 0.7788 \approx 1931 \)
- \( f(1) \approx 2480 \times 0.1947 \approx 483 \)
- \( f(2) \approx 2480 \times 0.02434 \approx 60 \)
- \( f(3) \approx 2480 \times 0.00203 \approx 5 \)
- \( f(4) \approx 2480 \times 0.000127 \approx 0 \)
- \( f(5) \approx 2480 \times 6.35\times10^{-6} \approx 0 \)
Comparison Table
| Accidents (X) | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| Observed (f) | 1970 | 422 | 71 | 13 | 3 | 1 |
| Theoretical \( f(x) \) | 1931 | 483 | 60 | 5 | 0 | 0 |
Conclusion: The Poisson distribution with \( \lambda = 0.25 \) fits the observed accident data reasonably well (the theoretical and observed frequencies are close).
Example 3 — Fountain Pens (Poisson Approximation)
Scenario: defective pen probability \( p = \dfrac{1}{500} \). Packets of \( n=10 \). Total packets \( N = 20000 \).
Step 1: Mean
\( \lambda = n p = 10 \times \dfrac{1}{500} = 0.02 \).
Step 2: Poisson formula and \( e^{-\lambda} \)
\( P[X=x] = \dfrac{e^{-\lambda}\,\lambda^x}{x!} \), and \( e^{-0.02} \approx 0.9801987 \) (rounded 0.9802).
Step 3: Packets with exactly one defective \( X=1 \)
\( P[X=1] = e^{-0.02}\cdot 0.02 \approx 0.9802 \times 0.02 = 0.019604 \).
\( f(1) = 20000 \times 0.019604 \approx 392.08 \Rightarrow \textbf{392 packets} \).
Step 4: Packets with exactly two defectives \( X=2 \)
\( P[X=2] = e^{-0.02}\dfrac{(0.02)^2}{2!} \approx \dfrac{0.9802 \times 0.0004}{2} = 0.00019604 \).
\( f(2) = 20000 \times 0.00019604 \approx 3.9208 \Rightarrow \textbf{4 packets} \).
Summary: \( \lambda = 0.02 \), \( f(1)\approx392 \), \( f(2)\approx4 \).
Example 4 — Typist Mistakes (100 pages)
| Mistakes per page (X) | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| Frequency (f) | 42 | 33 | 14 | 6 | 4 | 1 |
Step 1: Mean \( \lambda \)
\( N = 100 \).
\( \sum fX = 0\cdot42 + 1\cdot33 + 2\cdot14 + 3\cdot6 + 4\cdot4 + 5\cdot1 = 100 \).
\( \lambda = \dfrac{100}{100} = 1 \).
Step 2: Initial probability \( p(0) \)
\( p(0) = e^{-1} \approx 0.367879 \) (rounded to 0.3679).
Step 3: Probabilities using recurrence (λ = 1)
- \( p(1) = p(0) \times \dfrac{1}{1} = 0.3679 \)
- \( p(2) = p(1) \times \tfrac{1}{2} \approx 0.1840 \)
- \( p(3) \approx 0.0613 \)
- \( p(4) \approx 0.0153 \)
- \( p(5) \approx 0.0031 \)
Step 4: Theoretical frequencies (N = 100)
- \( f(0) \approx 37 \)
- \( f(1) \approx 37 \)
- \( f(2) \approx 18 \)
- \( f(3) \approx 6 \)
- \( f(4) \approx 2 \)
- \( f(5) \approx 0 \)
Comparison Table
| Mistakes (X) | 0 | 1 | 2 | 3 | 4 | 5 | Total |
|---|---|---|---|---|---|---|---|
| Observed (f) | 42 | 33 | 14 | 6 | 4 | 1 | 100 |
| Theoretical \( f(x) \) | 37 | 37 | 18 | 6 | 2 | 0 | 100 |
Conclusion: Poisson with \( \lambda = 1 \) fits the typist data well — theoretical frequencies closely match observed counts.
Video
Contact / Reach PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: PostNetwork Academy
- Facebook: PostNetwork Academy
- LinkedIn: PostNetwork Academy
- GitHub: postnetworkacademy
