Central Limit Theorem

The Central Limit Theorem (CLT) is one of the concepts most — dare I say it? — central to probability theory and statistics, and in many ways can be thought of as a generalization of the Law of Large Numbers (LLN) explored in issue 1. In fact, LLN is used as a step in proving CLT.

CLT states that the distribution of the sum of many independent identically distributed (iid) random variables is approximately Normally (bell curve) distributed. For iid: think repeated tosses of a coin, where the random variable is either 1 for heads or 0 for tails, and the sum is simply the number of heads. This is a very strong, and fairly unintuitive statement. Speaking generally, we've said nothing about the random variables we are working with (they could be fair coin tosses; they could be something very unpredictable like the outside temperature), yet we have strong knowledge of the limiting distribution of the sum. We explore this idea in-depth, below.

Binomial → Normal

The first plot is basically LLN all over again, except in the LLN visualizations of issue 1, we focused on the long-term behavior for a large number of tosses. Here, we are exploring the initial behavior, for a relatively low number of tosses, but a large number of coins. At first, when tosses=5, the values you can obtain are clearly "discretized". For example, you can't get between 3 and 4 heads out of 5 tosses (0.6 and 0.8, respectively) — it's impossible. This kind of discrete probability distribution is called the "binomial distribution." However, as the number of tosses increases, more values become accessible. When tosses=10, we can now obtain a value between 0.6 and 0.8 — 7 heads out of 10 tosses (0.7). And so forth. We can see that as tosses increases, the accessible values become roughly continuous, and begin to approximate a Normal distribution. This phenomenon is called the De Moivre–Laplace theorem. Finally, we see towards the end of the animation, for tosses becoming sufficiently large, we see the Law of Large Numbers start to kick-in, as the distribution begins to collapse onto the true average of 0.5.

Sampling Distribution → Binomial

Instead of thinking about the distribution of a few coins averaged over a large number of tosses (LLN), we may consider a lot of coins averaged over only few tosses=10. Here, we see initially for a small number of coins, the distribution is fairly uniform — relatively equal probability of getting 3/10 heads as it is getting 7/10 heads. However, as the number of coins increases, this murkiness fades away (via CLT), converging to the binomial distribution, which is the closest approximation of the Normal distribution accessible at only tosses=10. Proportions in-between, say 6/10 and 7/10 heads, are inaccessible at tosses=10, for the reasons regarding discretization discussed above.

Sampling Distribution → Normal

Finally, we consider a lot of coins averaged over a sizeable number of tosses=300. While before, when tosses=10, we could only access proportions which were a multiple of 0.1 (1/10, 2/10, 3/10, ...), here we can access any proportion which is a multiple of 0.00333 (1/300, 2/300, 3/300, ...). With this many tosses, even though technically this distribution is still inherently discrete binomial, enough intermediate values are accessible that it becomes a good approximation for continuous distributions as well. Here, we see the CLT most clearly at play. While at first, for a low number of coins the distribution appears quite haphazard, as the number of coins becomes large, the distribution converges to the Normal distribution. At coins=1e+05 (100K), the distribution is almost exactly Normal, and for increasing numbers of coins, the distribution remains nearly unchanged. It has reached the limiting Normal distribution.


Code for plots Seeing Statistics aims to animate the predictable structures that emerge from repeated randomness.