This is a notebook to learn about the central limit theorem. The basic idea is to draw N random numbers $\{x_i\}$ (for $i=1\ldots N$) from some probability distribution $p(x)$ and calculate the sum $y=\sum_{i=1}^N x_i$.Note that in general that $y$ is a random variable. These means that if I draw a different set of $M$ numbers, I will get a slightly different value for $y$.
In statstical physics, we are often interested in the behavior of such extensive variables (variables that scale with $N$). We would like to understand its average value, its fluctuations , and how these scale with $N$.
In this notebook, we will try to get an intuition for this by repeatedly calculating $y$ for different draws of $N$ random. Let $y_\alpha$ (with $\alpha=1\ldots M$) be the sum on $\alpha$'th time I draw $N$ numbers. Then, we can make a histogram of these $y_\alpha$. This historgram tells us about the probability of observing a $y_\alpha$.
We now perform this when the $x_i$ are binary variables with $x_i=\pm 1$ with $$ p(x_i=1)=q\\ p(x_i=0)=1-q $$
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#Draw M sets of N random numbers
N=100
M=100
q=0.9
Data=np.random.binomial(1,q,(M,N))
#Draw from random distribution
#mean=10;
#sigma=5;
#Data=np.random.normal(mean,sigma,(M,N))
#Draw from Gamma distribution
#shape=2
#scale=2
#Data=np.random.gamma(shape,scale,(M,N))
y_vector=np.sum(Data, axis=1)
plt.clf()
sns.distplot(y_vector, kde='False');
plt.show()
#Calculate mean value
mean_y=np.mean(y_vector)
print("The empirical mean is", mean_y)
std_y=np.std(y_vector)
print("The empirical std is", std_y)
#Print Theoretical std: print("The theoretical std for bernoulli is:", np.sqrt(N*q*(1-q)))
We now perform a similar simulation when the $x_i$ are continuous variables drawn from some other distributions: Normal Distribution or even Gamma Distribution (look up on Wikipedia). Here fix $M=5000$.