Estimate the Size of a Population from a Random Sample

Introduction

A given population of an unknown number of items, estimate the size of the population from a random sample and the algorithm below.

Accuracy can be improved by taking several samples then taking the average of the estimated sizes.

Plotting the size estimates as a histogram may also give you a feel for the data.

Watch the video in the links below.

Project #1

Create an interactive program to estimate the size of a population from a random sample.

Project #2

Create an interactive program to estimate the size of a population from several random samples.

matplotlib.pyplot (documentation and examples)
To see more random data generation and pyplot examples click HERE

Project #3

Estimate the population size using several sample sizes. For each sample size, estimate the size of the population (100 times?). Output the data to a file for further analyses.

Using the data in the file

  1. Plot the sample sizes vs the accuracy percentage.

  2. Plot the sample sizes vs the differences between the actual and estimated.

Algorithm (from the video)

For example, assuming

  1. Find the largest/maximum sample value

    MAX = max(S)   or   MAX = S[-1]

  2. Calculate the sum of the gaps between the elements in the sample

    SUM_OF_GAPS = S0-1 + S1-S0-1 + S2-S1-1 + S3-S2-1 + ... + Sk-1-Sk-2-1

    Don't forget there is a gap from the first integer (1) to the first element in the sample (S0).

  3. Calculate the average gap size

    AVERAGE_GAP = SUM_OF_GAPS / k

  4. Estimate the size of the population P

    SIZE = MAX + AVERAGE_GAP

    The maximum sample value plus the average gap size.

Links

The Clever Way to Count Tanks (YouTube)

Output Display Example

Your output does not need to look like this.

sample size: 20 population size: 2000 gap between 0 to 40 is 39 gap between 40 to 44 is 3 gap between 44 to 84 is 39 gap between 84 to 320 is 235 gap between 320 to 362 is 41 gap between 362 to 423 is 60 gap between 423 to 591 is 167 gap between 591 to 628 is 36 gap between 628 to 742 is 113 gap between 742 to 907 is 164 ... number of gaps : 20 sum of gaps : 1950 average gap size: 97.5 sample size : 20 sample sorted : [ 40, 44, 84, 320, 362, 423, 591, 628, 742, 907, 909, 1048, 1154, 1261, 1293, 1329, 1497, 1777, 1962, 1970 ] actual population size is 2000 estimated population size is 2067.5 difference is 67.5 percent difference is 3.4%