This is a notebook to learn about the central limit theorem. The basic idea is to draw N random numbers $\{x_i\}$ (for $i=1\ldots N$) from some probability distribution $p(x)$ and calculate the sum $y=\sum_{i=1}^N x_i$.Note that in general that $y$ is a *random variable*. These means that if I draw a different set of $M$ numbers, I will get a slightly different value for $y$.

In statstical physics, we are often interested in the behavior of such * extensive * variables (variables that scale with $N$). We would like to understand its * average value*, its * fluctuations *, and how these scale with $N$.

In this notebook, we will try to get an intuition for this by repeatedly calculating $y$ for different draws of $N$ random. Let $y_\alpha$ (with $\alpha=1\ldots M$) be the sum on $\alpha$'th time I draw $N$ numbers. Then, we can make a histogram of these $y_\alpha$. This historgram tells us about the probability of observing a $y_\alpha$.

We now perform this when the $x_i$ are binary variables with $x_i=\pm 1$ with $$ p(x_i=1)=q\\ p(x_i=0)=1-q $$

- Please play around with the code below. How does $N$, $M$, and $q$ effect the mean and the the fluctuations of the distribution?
- How would I identify the center of this distribution and the "width" of the distribution from theory? Derive expressions for these.
- Can you relate these theoretically-dervied expressions to the empirical mean and standard deviation? For what $M$ do I get $10\%$ error, how about $0.01\%$ error?
- Is there something special about binary variables?
- Make plots of the the empirically observed mean and "width" as a function of $N$.

In [8]:

```
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#Draw M sets of N random numbers
N=100
M=100
q=0.9
Data=np.random.binomial(1,q,(M,N))
#Draw from random distribution
#mean=10;
#sigma=5;
#Data=np.random.normal(mean,sigma,(M,N))
#Draw from Gamma distribution
#shape=2
#scale=2
#Data=np.random.gamma(shape,scale,(M,N))
y_vector=np.sum(Data, axis=1)
plt.clf()
sns.distplot(y_vector, kde='False');
plt.show()
#Calculate mean value
mean_y=np.mean(y_vector)
print("The empirical mean is", mean_y)
std_y=np.std(y_vector)
print("The empirical std is", std_y)
#Print Theoretical std: print("The theoretical std for bernoulli is:", np.sqrt(N*q*(1-q)))
```

We now perform a similar simulation when the $x_i$ are continuous variables drawn from some other distributions: Normal Distribution or even Gamma Distribution (look up on Wikipedia). Here fix $M=5000$.

- Please play around with the code below. How does $N$ effect the mean and the the fluctuations of the distribution? Make a plot of the width and mean as a function of $N$.
- Is there something special about binary variables or the probability distribution we draw from (as far as scaling with $N$)?

In [ ]:

```
```