Introduction
A given population of an unknown number of items,
estimate the size of the population
from a random sample and the algorithm below.
Accuracy can be improved by
taking several samples then taking the average
of the estimated sizes.
Plotting the size estimates as a histogram
may also give you a feel for the data.
Watch the video in the links below.
Project #1
Create an interactive program to estimate
the size of a population from a random sample.
- create a test population of integers (1 to N)
- loop
- ask the user for the sample size
- use the algorithm below to estimate
the size of the population
- display the results
- actual size
- estimated size
- size difference
- accuracy percentage
- other stats
Project #2
Create an interactive program to estimate
the size of a population
from several random samples.
- create a test population of integers (1 to N)
- loop
- ask the user for the sample size
- ask the users how many time (X) to sample the population
- Loop X times
- use the algorithm below to estimate
the size of the population
- average the estimated population sizes
- plot the estimated sizes
- display the results
- actual size
- estimated size
- accuracy percentage
- size difference
- other stats
matplotlib.pyplot
(documentation and examples)
To see more random data generation
and pyplot examples click
HERE
Project #3
Estimate the population size using
several sample sizes. For each sample size,
estimate the size of the population (100 times?).
Output the data to a file for further
analyses.
Using the data in the file
- Plot the sample sizes vs the accuracy percentage.
- Plot the sample sizes vs the differences between
the actual and estimated.
Algorithm (from the video)
For example, assuming
- a population "P" of integers (1 to N)
with no gaps
- the population is a Python list or tuple (iterable)
- a random sample "S" from the population "P"
- a sample size of "k" elements
- the sample sorted into ascending order
- the sample is a Python list or tuple (iterable)
- Find the largest/maximum sample value
MAX = max(S) or MAX = S[-1]
- Calculate the sum of the gaps between the elements in the sample
SUM_OF_GAPS = S0-1 +
S1-S0-1 +
S2-S1-1 +
S3-S2-1 +
... +
Sk-1-Sk-2-1
Don't forget there is a gap from the first integer (1)
to the first element in the sample (S0).
- Calculate the average gap size
AVERAGE_GAP = SUM_OF_GAPS / k
- Estimate the size of the population P
SIZE = MAX + AVERAGE_GAP
The maximum sample value plus the average gap size.
Links
The Clever Way to Count Tanks
(YouTube)
Output Display Example
Your output does not need to look like this.
sample size: 20
population size: 2000
gap between 0 to 40 is 39
gap between 40 to 44 is 3
gap between 44 to 84 is 39
gap between 84 to 320 is 235
gap between 320 to 362 is 41
gap between 362 to 423 is 60
gap between 423 to 591 is 167
gap between 591 to 628 is 36
gap between 628 to 742 is 113
gap between 742 to 907 is 164
...
number of gaps : 20
sum of gaps : 1950
average gap size: 97.5
sample size : 20
sample sorted : [ 40, 44, 84, 320, 362, 423, 591, 628,
742, 907, 909, 1048, 1154, 1261, 1293,
1329, 1497, 1777, 1962, 1970 ]
actual population size is 2000
estimated population size is 2067.5
difference is 67.5
percent difference is 3.4%