Hi Benjamin, yes it's pretty weird - as far as I know, it is how Cochran's formula works.

The first formula calculates our sample size for big datasets (I don't know the exact boundary for this, I'm not sure if there is one - but as you've shown, anything above 100K makes little difference so likely around there).

The second formula essentially adjusts the value we found from the first to suit smaller datasets. Once we get towards that 'big dataset' boundary the lower part (1 + ( n_0 - 1) / N) becomes infinitesimally close to 1, thus having zero effect on n_0.

Logically, this is weird and to me a population of 1e9 being represented by 385 seems entirely wrong (and maybe it is at such high numbers) - but I leave that up to the statisticians and mathematicians who are much more intelligent than me.

I would probably just be wary applying this formula to very large populations - I have seen another called Slovin's formula which may be more appropriate - but I have never used it before so I don't know if it is better or not!

If you do find anymore info on this let me know, a quick Google from me didn't turn up with any comments on population size limits for Cochran's formula.

Cochran sample size calculator here:

https://www.calculator.net/sample-size-calculator.html?type=1&cl=95&ci=5&pp=50&ps=&x=76&y=33

Thanks for the comment!

Freelance ML engineer learning and writing about everything.

Freelance ML engineer learning and writing about everything.