Notebook 20: Variational autoencoder for the Ising Model with Keras

Learning Goal

The goal of this notebook is to implement a VAE to learn a generative model for the 2D Ising model. The goal will be to understand how latent variables can capture physical quantities (such as the order parameter) and the effect of hyperparameters on VAE results.


In this notebook, we will write a variational autoencoder (VAE) in Keras for the 2D Ising model dataset. The code in this notebook is adapted from ( and reproduces some of the results found in ( The goal of the notebook is to show how to implement a variational autoencoder in Keras in order to learn effective low-dimensional representations of equilibrium samples drawn from the 2D ferromagnetic Ising model with periodic boundary conditions.

Structure of the notebook

The notebook is structured as follows.

  1. We load in the Ising dataset
  2. We construct the variational auto encoder model using Keras
  3. We train the model on a training set and then visualize the latent representation
  4. We compare the learned representation with the magnetization order parameter

Load the dataset

In [1]:
import pickle
from sklearn.model_selection import train_test_split
import collections

def load_data_set(root="IsingMC/", train_size = 0.5):
    """Loads the Ising dataset in the format required for training the tensorflow VAE
    root: str, default = "IsingMC/"
        Location of the directory containing the Ising dataset
    train_size: float, default = 0.5
        Size ratio of the training set. 1-train_size corresponds to the test set size ratio.
    # The Ising dataset contains 16*10000 samples taken in T=np.arange(0.25,4.0001,0.25)
    data = pickle.load(open(root+'Ising2DFM_reSample_L40_T=All.pkl','rb'))
    data = np.unpackbits(data).astype(int).reshape(-1,1600) # decompression of data and casting to int. 
    Y = np.hstack([t]*10000 for t in np.arange(0.25,4.01,0.25)) # labels

    # Here we downsample the dataset and use 1000 samples at each temperature
    tmp = np.arange(10000)
    X = np.vstack(data[i*10000:(i+1)*10000][rand_idx] for i, _ in enumerate(np.arange(0.25,4.01,0.25)))
    Y = np.hstack(Y[i*10000:(i+1)*10000][rand_idx] for i, _ in enumerate(np.arange(0.25,4.01,0.25)))
    # Note that data is not currently shuffled
    return X, Y

Constructing a VAE class

Here, we implement the VAE in a slightly different way than we did for the MNIST dataset. We have chosen to create a new VAE class so that the parameters can be easily changed for new data.

In [2]:
from __future__ import print_function
import os
import numpy as np

from scipy.stats import norm

from keras.layers import Input, Dense, Lambda
from keras.models import Model
from keras import backend as K
from keras import metrics, losses
from keras.datasets import mnist

class VAE:
    def __init__(self, batch_size=100, original_dim =1600, latent_dim = 100, epochs=50, root="IsingMC/", epsilon=0.5):
         - Auto-Encoding Variational Bayes
           This code is taken from Keras VAE tutorial available at
        batch_size : int, default=100
            Size of batches for gradient descent
        original_dim : int, default =1600
            Number of features
        latent_dim: int, default = 100
            Dimensionality of the latent space
        epochs: int, default = 50
            Number of epochs for training
        self.batch_size = batch_size
        self.original_dim = original_dim
        self.latent_dim = latent_dim
        self.intermediate_dim = 256
        self.epochs = epochs
        self.epsilon_std = epsilon
    def sampling(self, args):
        ''' Sampling from the latent variables using the means and log-variances'''
        z_mean, z_log_var = args
        epsilon = K.random_normal(shape=(K.shape(z_mean)[0], self.latent_dim), mean=0.,
        return z_mean + K.exp(z_log_var / 2) * epsilon

    def build(self):
        """ This class method constructs the VAE model
        original_dim = self.original_dim
        latent_dim = self.latent_dim
        intermediate_dim = self.intermediate_dim
        # encoder
        self.x = Input(shape=(original_dim,))
        h = Dense(intermediate_dim, activation='relu')(self.x)
        self.z_mean = Dense(latent_dim)(h)
        self.z_log_var = Dense(latent_dim)(h)
        # note that "output_shape" isn't necessary with the TensorFlow backend
        z = Lambda(self.sampling, output_shape=(latent_dim,))([self.z_mean, self.z_log_var])
        # we instantiate these layers separately so as to reuse them later
        decoder_h = Dense(intermediate_dim, activation='relu')
        decoder_mean = Dense(original_dim, activation='sigmoid')
        h_decoded = decoder_h(z)
        x_decoded_mean = decoder_mean(h_decoded)
        decoder_input = Input(shape=(latent_dim,))
        _h_decoded = decoder_h(decoder_input)
        _x_decoded_mean = decoder_mean(_h_decoded)
        self.generator = Model(decoder_input, _x_decoded_mean)

        # end-to-end VAE model
        self.vae = Model(self.x, x_decoded_mean)
        # encoder, from inputs to latent space
        self.encoder = Model(self.x, self.z_mean)
        # decoder
        #self.decoder = Model(decoder_input, _x_decoded_mean)
        # Compute VAE loss
        self.vae.compile(optimizer='rmsprop', loss=self.vae_loss)
        # Prints a summary of the architecture used
    def vae_loss(self, x, x_decoded_mean):
            xent_loss = losses.binary_crossentropy(x, x_decoded_mean)
            kl_loss = - 0.5 * K.mean(1 + self.z_log_var - K.square(self.z_mean) - K.exp(self.z_log_var), axis=-1)
            return xent_loss + kl_loss
    def train(self, x_train, x_test):
        from sklearn.preprocessing import minmax_scale
        x_train = x_train.reshape((len(x_train),[1:]))) # flatten each sample out 
        x_test = x_test.reshape((len(x_test),[1:])))
        x_train = minmax_scale(x_train) # this step is required in order to use cross-entropy loss for reconstruction
        x_test = minmax_scale(x_train) # scaling features in 0,1 interval
       , x_train,
                validation_data=(x_test, x_test)

        # build a model to project inputs on the latent space
        #encoder = Model(self.x, self.z_mean)
    def predict_latent(self, xnew):
        # build a model to project inputs on the latent space
        return self.encoder.predict(xnew)
    def generate_decoding(self, znew):
        # Generate new fantasy particles 
        return self.generator.predict(znew)
/Users/mbukov/anaconda3/envs/DNN/lib/python3.6/site-packages/h5py/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Load the 2D Ising dataset

In [3]:
# The y labels are the temperatures in np.arange(0.25,4.01,0.2) at which X was drawn

#Directory where data is stored
X, Y = load_data_set(root= root)

Construct a training and a validation set

In [4]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.8)
(32000, 1600)

Construct and train the variational autoencoder model

In [5]:
model = VAE(epochs=5, latent_dim=2, epsilon=0.2) # Choose model parameters # Construct VAE model using Keras
model.train(xtrain, xtest) # Trains VAE model based on custom loss function
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 1600)         0                                            
dense_1 (Dense)                 (None, 256)          409856      input_1[0][0]                    
dense_2 (Dense)                 (None, 2)            514         dense_1[0][0]                    
dense_3 (Dense)                 (None, 2)            514         dense_1[0][0]                    
lambda_1 (Lambda)               (None, 2)            0           dense_2[0][0]                    
dense_4 (Dense)                 (None, 256)          768         lambda_1[0][0]                   
dense_5 (Dense)                 (None, 1600)         411200      dense_4[0][0]                    
Total params: 822,852
Trainable params: 822,852
Non-trainable params: 0
/Users/mbukov/anaconda3/envs/DNN/lib/python3.6/site-packages/sklearn/utils/ DataConversionWarning: Data with input dtype int64 was converted to float64.
  warnings.warn(msg, DataConversionWarning)
Train on 32000 samples, validate on 32000 samples
Epoch 1/5
32000/32000 [==============================] - 16s 508us/step - loss: 5.4376 - val_loss: 0.8315
Epoch 2/5
32000/32000 [==============================] - 18s 555us/step - loss: 2.9111 - val_loss: 1.8376
Epoch 3/5
32000/32000 [==============================] - 13s 419us/step - loss: 2.2729 - val_loss: 0.8762
Epoch 4/5
32000/32000 [==============================] - 13s 399us/step - loss: 1.6337 - val_loss: 0.8224
Epoch 5/5
32000/32000 [==============================] - 13s 408us/step - loss: 1.3037 - val_loss: 1.1051

Encoding samples to latent space:

We predict the latent variable coordinates for the test set:

In [6]:
zpred = model.predict_latent(xtest)
(128000, 2)

Let's visualize this 2-dimensional space. We also color each sample according to the temperature at which it was drawn. The largest temperature is red ($T=4.0$) and lowest is blue ($T=0.25$).

In [7]:
# To make plots pretty
golden_size = lambda width: (width, 2. * width / (1 + np.sqrt(5)))

%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,figsize=golden_size(8))
sc = ax.scatter(zpred[:,0], zpred[:,1], c=ytest/4.0, s=4, cmap="coolwarm")
ax.set_xlabel('First latent dimension of the VAE')
ax.set_ylabel('Second latent dimension of the VAE')
plt.colorbar(sc, label='$0.25\\times$Temperature')

Understanding the latent space embeddings

To better understand the latent space, we can plot each of the latent dimension coordinates against the corresponding magnetization of each sample.

In [8]:
fig, ax = plt.subplots(1,2,figsize=(15,8))
ax[0].scatter(zpred[:,0], np.mean(xtest, axis=1), c=ytest/4.0, s=2, cmap="coolwarm")
ax[0].set_xlabel('First latent dimension of the VAE')
sc = ax[1].scatter(zpred[:,1], np.mean(xtest, axis=1), c=ytest/4.0, s=2, cmap="coolwarm")
ax[1].set_xlabel('Second latent dimension of the VAE')
plt.colorbar(sc, label='$0.25\\times$Temperature')

It appears that these dimensions are strongly correlated, meaning that the learned representation is effectively one-dimensional. This can be understood by the fact that in order to draw samples at high and low temperatures, we only require the information about the magnetization order parameter (we only have to draw samples from a factorized mean-field distribution): \begin{equation} p(s_i=\pm) = \frac{1\pm m}{2}, \end{equation} where $p(s_i=\pm)$ is the probability that spin $i$ is up ($+$) or down ($-$), given that the magnetization sector is fixed. Note that this is not true in the vicinity of the critical point, where mean-field theory fails as the system develops long-range correlations.

We see that the VAE correctly captures the structure of the data. The high-temperature samples cluster at intermediate values and the ordered samples with positive and negative magnetization cluster in opposite regions. This can be more effectively visualized using a 1-D histogram:

In [9]:
# Make histogram at the 

Generating New Examples

So far in this notebook, we have shown that the latent structure of VAEs can automatically identify order parameters. This is not surprising since even the first principle component in a PCA is essentially the magnetization.

The interesting feature of VAEs is that they are also a generative model. We now ask how well the VAE can generate new examples. Our decoder returns probabilities for each pixel being 1. We then can draw random numbers to generate samples. This is done in the short function below.

One again, as in the VAE MNIST notebook, we will sample our latent space togenerate the particles in two different ways

  • Sampling uniformally in the latent space
  • Sampling accounting for the fact that the latent space is Gaussian so that we expect most of the data points to be centered around (0,0) and fall off exponentially in all directions. This is done by transforming the uniform grid using the inverse Cumulative Distribution Function (CDF) for the Gaussian.
In [10]:
# Generate fantasy particles
def generate_samples(model, z_input):
    return samples
In [11]:
# display a 2D manifold of the images
n = 5  # figure with 15x15 images
quantile_min = 0.01
quantile_max = 0.99


# Linear Sampling
# we will sample n points within [-15, 15] standard deviations
z1_u = np.linspace(5, -5, n)
z2_u = np.linspace(5, -5, n)
z_grid = np.dstack(np.meshgrid(z1_u, z2_u))
z_input=np.array(z_grid.reshape(n*n, latent_dim))
x_pred_grid = generate_samples(model,z_input) \
                     .reshape(n, n, img_rows, img_cols)
# Inverse CDF sampling
z1 = norm.ppf(np.linspace(quantile_min, quantile_max, n))
z2 = norm.ppf(np.linspace(quantile_max, quantile_min, n))
z_grid2 = np.dstack(np.meshgrid(z1, z2))
z_input=np.array(z_grid2.reshape(n*n, latent_dim))
x_pred_grid2 = generate_samples(model,z_input) \
                     .reshape(n, n, img_rows, img_cols)

# Plot figure
fig, ax = plt.subplots(figsize=golden_size(10))

ax.imshow(np.block(list(map(list, x_pred_grid))), cmap='gray')

ax.set_xticks(np.arange(0, n*img_rows, img_rows) + .5 * img_rows)
ax.set_xticklabels(map('{:.2f}'.format, z1), rotation=90)

ax.set_yticks(np.arange(0, n*img_cols, img_cols) + .5 * img_cols)
ax.set_yticklabels(map('{:.2f}'.format, z2))



# Plot figure Inverse CDF sampling
fig, ax = plt.subplots(figsize=golden_size(10))

ax.imshow(np.block(list(map(list, x_pred_grid2))), cmap='gray_r', vmin=0, vmax=1)

ax.set_xticks(np.arange(0, n*img_rows, img_rows) + .5 * img_rows)
ax.set_xticklabels(map('{:.2f}'.format, z1), rotation=90)

ax.set_yticks(np.arange(0, n*img_cols, img_cols) + .5 * img_cols)
ax.set_yticklabels(map('{:.2f}'.format, z2))

ax.set_title('Inverse CDF')
(25, 2)
(5, 5, 40, 40)

Examples fail to capture critial correlations

We see that the examples look quite different from real Ising configurations -- they lack the large scale patchiness seen in the critical region. They mostly turn out to be unstructured speckles that reflect only the average probability that a pixel is on in a region. This is not surprising since our VAE has no spatial structure and gets very little information about correlations in the binary cross-entropy which we use to measure reconstruction errors.


  • To gain more intuition for the VAE parameters, change $\epsilon$. What happens when epsilon is $0.01$, $1$, $10$?
  • Change the number of latent dimensions to 1, or 10. Visualize the 10 dimensional latent space using PCA? Are the reconstructed samples and loss functions very different?
  • What do you think would happen for other spin systems where the order parameter is a vector such as nematically ordered materials?
  • Replace the decoder with a (de)convolutional net that includes spatial structure? Doe this improve the performance of the VAE for generating examples?