Introduction to Central Limit Theorem

This is a notebook to learn about the central limit theorem. The basic idea is to draw N random numbers $\{x_i\}$ (for $i=1\ldots N$) from some probability distribution $p(x)$ and calculate the sum $y=\sum_{i=1}^N x_i$.Note that in general that $y$ is a random variable. These means that if I draw a different set of $M$ numbers, I will get a slightly different value for $y$.

In statstical physics, we are often interested in the behavior of such extensive variables (variables that scale with $N$). We would like to understand its average value, its fluctuations , and how these scale with $N$.

In this notebook, we will try to get an intuition for this by repeatedly calculating $y$ for different draws of $N$ random. Let $y_\alpha$ (with $\alpha=1\ldots M$) be the sum on $\alpha$'th time I draw $N$ numbers. Then, we can make a histogram of these $y_\alpha$. This historgram tells us about the probability of observing a $y_\alpha$.

Binary Variables

We now perform this when the $x_i$ are binary variables with $x_i=\pm 1$ with $$ p(x_i=1)=q\\ p(x_i=0)=1-q $$

  • Please play around with the code below. How does $N$, $M$, and $q$ effect the mean and the the fluctuations of the distribution?
  • How would I identify the center of this distribution and the "width" of the distribution from theory? Derive expressions for these.
  • Can you relate these theoretically-dervied expressions to the empirical mean and standard deviation? For what $M$ do I get $10\%$ error, how about $0.01\%$ error?
  • Is there something special about binary variables?
  • Make plots of the the empirically observed mean and "width" as a function of $N$.
In [8]:
import numpy as np
import matplotlib.pyplot as plt

import seaborn as sns
%matplotlib inline  
#Draw M sets of N random numbers




#Draw from random distribution

#Draw from Gamma distribution

y_vector=np.sum(Data, axis=1)

sns.distplot(y_vector, kde='False');

#Calculate mean value

print("The empirical mean is", mean_y)
print("The empirical std is", std_y)
#Print Theoretical std: print("The theoretical std for bernoulli is:", np.sqrt(N*q*(1-q)))
('The empirical mean is', 90.090000000000003)
('The empirical std is', 3.1814933600433615)

Continuous Variables

We now perform a similar simulation when the $x_i$ are continuous variables drawn from some other distributions: Normal Distribution or even Gamma Distribution (look up on Wikipedia). Here fix $M=5000$.

  • Please play around with the code below. How does $N$ effect the mean and the the fluctuations of the distribution? Make a plot of the width and mean as a function of $N$.
  • Is there something special about binary variables or the probability distribution we draw from (as far as scaling with $N$)?
In [ ]: