{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook 11: Introduction to Deep Neural Networks with Keras" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goals\n", "The goal of this notebook is to introduce deep neural networks (DNNs) using the high-level Keras package. The reader will become familiar with how to choose an architecture, cost function, and optimizer in Keras. We will also learn how to train neural networks.\n", "\n", "\n", "# MNIST with Keras\n", "\n", "We will once again work with the MNIST dataset of hand written digits introduced in *Notebook 7: Logistic Regression (MNIST)*. The goal is to find a statistical model which recognizes and distinguishes between the ten handwritten digits (0-9).\n", "\n", "The MNIST dataset comprises $70000$ handwritten digits, each of which comes in a square image, divided into a $28\\times 28$ pixel grid. Every pixel can take on $256$ nuances of the gray color, interpolating between white and black, and hence each data point assumes any value in the set $\\{0,1,\\dots,255\\}$. Since there are $10$ categories in the problem, corresponding to the ten digits, this problem represents a generic classification task. \n", "\n", "In this Notebook, we show how to use the Keras python package to tackle the MNIST problem with the help of deep neural networks.\n", "\n", "The following code is a slight modification of a Keras tutorial, see [https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py](https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py). We invite the reader to read Sec. IX of the review to acquire a broad understanding of what the separate parts of the code do." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "from __future__ import print_function\n", "import keras,sklearn\n", "# suppress tensorflow compilation warnings\n", "import os\n", "os.environ['KMP_DUPLICATE_LIB_OK']='True'\n", "import tensorflow as tf\n", "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'\n", "import numpy as np\n", "seed=0\n", "np.random.seed(seed) # fix random seed\n", "tf.set_random_seed(seed)\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Structure of the Procedure\n", "\n", "Constructing a Deep Neural Network to solve ML problems is a multiple-stage process. Quite generally, one can identify the key steps as follows:\n", "\n", "* ***step 1:*** Load and process the data\n", "* ***step 2:*** Define the model and its architecture\n", "* ***step 3:*** Choose the optimizer and the cost function\n", "* ***step 4:*** Train the model \n", "* ***step 5:*** Evaluate the model performance on the *unseen* test data\n", "* ***step 6:*** Modify the hyperparameters to optimize performance for the specific data set\n", "\n", "We would like to emphasize that, while it is always possible to view steps 1-5 as independent of the particular task we are trying to solve, it is only when they are put together in ***step 6*** that the real gain of using Deep Learning is revealed, compared to less sophisticated methods such as the regression models or bagging, described in Secs. VII and VIII of the review. With this remark in mind, we shall focus predominantly on steps 1-5 below. We show how one can use grid search methods to find optimal hyperparameters in ***step 6***." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Load and Process the Data\n", "\n", "Keras can conveniently download the MNIST data from the web. All we need to do is import the `mnist` module and use the `load_data()` class, and it will create the training and test data sets or us.\n", "\n", "The MNIST set has pre-defined test and training sets, in order to facilitate the comparison of the performance of different models on the data.\n", "\n", "Once we have loaded the data, we need to format it in the correct shape. This differs from one package to the other and, as we see in the case of Keras, it can even be different depending on the backend used.\n", "\n", "While choosing the correct `datatype` can help improve the computational speed, we emphasize the rescaling step, which is necessary to avoid large variations in the minimal and maximal possible values of each feature. In other words, we want to make sure a feature is not being over-represented just because it is \"large\".\n", "\n", "Last, we cast the label vectors $y$ to binary class matrices (a.k.a. one-hot format), as explained in Sec. VII on SoftMax regression." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz\n", "11493376/11490434 [==============================] - 2s 0us/step\n", "an example of a data point with label 4\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQQAAAECCAYAAAAYUakXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAADlpJREFUeJzt3V+MVfV6xvHnqdgLUQFhJBOLRZELmppi3WqjJ9XmpCfWaJALTTE2mGjQeIx/okmFC4WYRq3CaS+MEZQcmiANRq1ekCoxRs+Jimf7J4illaOhikyGIZoAV0R9ezGbt3N05rdnZv9Ze4bvJyGzZ717Zj0umSdr7f2bhSNCACBJf1R1AAC9g0IAkCgEAIlCAJAoBACJQgCQKikE21fb/h/bv7f9YBUZSmzvt/2J7Y9t13sgz2bbh2zvGbHtLNs7be9rfJzTY/nW2v66cQw/tn1NhfkW2H7T9l7bn9q+p7G9J45hIV/Xj6G7vQ7B9imSPpP0t5IOSPqdpBUR8V9dDVJge7+kWkQcrjqLJNn+a0nHJP1bRPx5Y9s/S/omIh5rlOqciPjHHsq3VtKxiHiyikwj2e6X1B8RH9o+Q9IHkq6XdIt64BgW8t2oLh/DKs4QLpX0+4j4IiKOS/p3ScsqyDFlRMTbkr750eZlkrY0Hm/R8F+gSoyRr2dExEBEfNh4fFTSXknnqEeOYSFf11VRCOdI+mrE5wdU0X98QUh63fYHtldVHWYM8yNiQBr+CyXp7IrzjOYu27sblxSVXdKMZHuhpIsk7VIPHsMf5ZO6fAyrKASPsq3X1k9fERF/KenvJP2ycUqMiXla0iJJSyUNSFpfbRzJ9umSXpR0b0QcqTrPj42Sr+vHsIpCOCBpwYjP/0TSwQpyjCkiDjY+HpL0soYvc3rNYOPa88Q16KGK8/yBiBiMiO8j4gdJm1TxMbR9qoZ/2LZGxEuNzT1zDEfLV8UxrKIQfidpse3zbP+xpL+X9GoFOUZle2bjhR3ZninpF5L2lL+qEq9KWtl4vFLSKxVm+YkTP2gNy1XhMbRtSc9J2hsRG0aMeuIYjpWvimPY9XcZJKnx9sm/SDpF0uaI+KeuhxiD7fM1fFYgSTMkPV91PtvbJF0laZ6kQUkPS/oPSdslnSvpS0k3REQlL+yNke8qDZ/qhqT9km4/cb1eQb6fSfqNpE8k/dDYvEbD1+mVH8NCvhXq8jGspBAA9CZWKgJIFAKARCEASBQCgEQhAEiVFkIPLwuWRL5W9XK+Xs4mVZev6jOEnv6fIvK1qpfz9XI2qaJ8VRcCgB7S0sIk21dL+lcNrzh8NiIeKz1/3rx5sXDhwvx8aGhIfX19k95/p5GvNb2cr5ezSe3Pt3//fh0+fHi0Xyz8AzMmu4PGjU6e0ogbndh+tXSjk4ULF6per/wGRMBJp1arjet5rVwycKMTYJpppRCmwo1OAExAK4Uwrhud2F5lu267PjQ01MLuAHRaK4UwrhudRMTGiKhFRK2XX8QB0Foh9PSNTgBM3KTfZYiI72zfJek1/f+NTj5tWzIAXTfpQpCkiNghaUebsgCoGCsVASQKAUCiEAAkCgFAohAAJAoBQKIQACQKAUCiEAAkCgFAohAAJAoBQKIQACQKAUCiEAAkCgFAohAAJAoBQKIQACQKAUCiEAAkCgFAauk27EA7ffbZZ8X5HXfcUZxv3bq1OO/v759wppMNZwgAEoUAIFEIABKFACBRCAAShQAgUQgA0rRah3D06NHi/NixY8X5rFmzivPTTjttwpkwfjt27CjO33rrreL82WefLc5Xr15dnM+YMa1+HCalpSNge7+ko5K+l/RdRNTaEQpANdpRiX8TEYfb8H0AVIzXEACkVgshJL1u+wPbq9oRCEB1Wr1kuCIiDto+W9JO2/8dEW+PfEKjKFZJ0rnnntvi7gB0UktnCBFxsPHxkKSXJV06ynM2RkQtImp9fX2t7A5Ah026EGzPtH3GiceSfiFpT7uCAei+Vi4Z5kt62faJ7/N8RPxnW1JN0uOPP16cP/roo8X5k08+WZzfd999E86E8bv44otb+vq1a9cW5ytWrCjOL7jggpb2Px1MuhAi4gtJf9HGLAAqxtuOABKFACBRCAAShQAgUQgAEoUAIPEL4COsW7euOD///POL82XLlrUzzklncHCw6ggnPc4QACQKAUCiEAAkCgFAohAAJAoBQKIQACTWIYzQ7N91uOWWW4rznTt3Fue12sl9l/pm/y7G+vXrO7r/7du3F+dr1qzp6P6nAs4QACQKAUCiEAAkCgFAohAAJAoBQKIQAKRptQ7hvPPO6+j3P3LkSHH+0EMPFedbt24tzufMmTPhTFPJvn37ivP333+/S0kwFs4QACQKAUCiEAAkCgFAohAAJAoBQKIQAKRptQ6h2f0KDh48WJyvXbu2pf2/9tprxfmLL75YnN92220t7b/XzZ8/vzhftGhRcf7555+3tP8bb7yxpa8/GTQ9Q7C92fYh23tGbDvL9k7b+xofp/eKGuAkMZ5Lhl9LuvpH2x6U9EZELJb0RuNzAFNc00KIiLclffOjzcskbWk83iLp+jbnAlCByb6oOD8iBiSp8fHs9kUCUJWOv8tge5Xtuu360NBQp3cHoAWTLYRB2/2S1Ph4aKwnRsTGiKhFRK2vr2+SuwPQDZMthFclrWw8XinplfbEAVClpusQbG+TdJWkebYPSHpY0mOSttu+VdKXkm7oZMjxOuWUU4rzu+++uzhvdr+CZr/P38xTTz1VnC9fvrw4nzt3bkv7r9rg4GBx3uo6A7SuaSFExIoxRj9vcxYAFWPpMoBEIQBIFAKARCEASBQCgEQhAEjT6n4IzcyaNas4v/zyy4vzVtch7N69uzj/6quvivNOr0M4fvx4cf7MM8+09P1feOGFlr4enccZAoBEIQBIFAKARCEASBQCgEQhAEgUAoB0Uq1DaKbZOoQtW7YU56169913i/OlS5cW5++8805L82PHjhXnjzzySHFetSVLlhTnc+bwrwU0wxkCgEQhAEgUAoBEIQBIFAKARCEASBQCgOSI6NrOarVa1Ov1ru2v3W6++ebi/Pnnn+9Sks5o9nfBdpeSdMamTZuK81tvvbVLSbqvVqupXq83/R/IGQKARCEASBQCgEQhAEgUAoBEIQBIFAKAxP0QJuD+++8vzrdt29alJNWY6usQ3nvvveJ8Oq9DGK+mZwi2N9s+ZHvPiG1rbX9t++PGn2s6GxNAN4znkuHXkq4eZfuvImJp48+O9sYCUIWmhRARb0v6pgtZAFSslRcV77K9u3FJwc3qgGlgsoXwtKRFkpZKGpC0fqwn2l5lu267PjQ0NMndAeiGSRVCRAxGxPcR8YOkTZIuLTx3Y0TUIqLW19c32ZwAumBShWC7f8SnyyXtGeu5AKaOpusQbG+TdJWkebYPSHpY0lW2l0oKSfsl3d7BjOiSxYsXF+fN1iFcc0353efZs2cX5+vWrSvO0XlNCyEiVoyy+bkOZAFQMZYuA0gUAoBEIQBIFAKARCEASBQCgMT9EKaQuXPnFucLFiwozh944IHifMWK0d5hbp+PPvqoOGcdQvU4QwCQKAQAiUIAkCgEAIlCAJAoBACJQgCQWIcwAYsWLSrOV65cWZx/8cUXxfmSJUuK8zvvvLM4v/DCC4vzk93rr79enH/77bfF+Zw50//WoZwhAEgUAoBEIQBIFAKARCEASBQCgEQhAEisQ5iAM888szjfvHlzl5JgMg4cOFCcHz9+vEtJehdnCAAShQAgUQgAEoUAIFEIABKFACBRCAAS6xDQNbNnzy7O+/v7i/OBgYF2xvmJ1atXF+cbN24szmfMmPo/Tk3PEGwvsP2m7b22P7V9T2P7WbZ32t7X+Dj97x4BTHPjuWT4TtL9EbFE0l9J+qXtP5P0oKQ3ImKxpDcanwOYwpoWQkQMRMSHjcdHJe2VdI6kZZK2NJ62RdL1nQoJoDsm9KKi7YWSLpK0S9L8iBiQhktD0tntDgegu8ZdCLZPl/SipHsj4sgEvm6V7brt+tDQ0GQyAuiScRWC7VM1XAZbI+KlxuZB2/2Neb+kQ6N9bURsjIhaRNT6+vrakRlAh4znXQZLek7S3ojYMGL0qqQT9x1fKemV9scD0E2OiPIT7J9J+o2kTyT90Ni8RsOvI2yXdK6kLyXdEBHflL5XrVaLer3eamZMU7t27SrOly9fXpwPDg62M85PHDlSvlKeOXNmR/ffilqtpnq97mbPa7qSIiJ+K2msb/TziQYD0LtYugwgUQgAEoUAIFEIABKFACBRCADS1P8Fbkwbl112WXH+yivltW/XXXddcd7q0vlma2iuvPLKlr5/L+AMAUCiEAAkCgFAohAAJAoBQKIQACQKAUBiHQKmjEsuuaQ437BhQ3H+xBNPFOfXXnttcV6r1Yrz6YAzBACJQgCQKAQAiUIAkCgEAIlCAJAoBACJdQiYNm666aaW5uAMAcAIFAKARCEASBQCgEQhAEgUAoBEIQBITQvB9gLbb9rea/tT2/c0tq+1/bXtjxt/rul8XACdNJ6FSd9Juj8iPrR9hqQPbO9szH4VEU92Lh6AbmpaCBExIGmg8fio7b2Szul0MADdN6HXEGwvlHSRpF2NTXfZ3m17s+05bc4GoMvGXQi2T5f0oqR7I+KIpKclLZK0VMNnEOvH+LpVtuu2663+23oAOmtchWD7VA2XwdaIeEmSImIwIr6PiB8kbZJ06WhfGxEbI6IWEbW+vr525QbQAeN5l8GSnpO0NyI2jNjeP+JpyyXtaX88AN00nncZrpD0D5I+sf1xY9saSStsL5UUkvZLur0jCQF0zXjeZfitJI8y2tH+OACqxEpFAIlCAJAoBACJQgCQKAQAiUIAkCgEAIlCAJAoBACJQgCQKAQAiUIAkCgEAIlCAJAoBADJEdG9ndlDkv53xKZ5kg53LcDEka81vZyvl7NJ7c/3pxHR9B6GXS2En+zcrkdErbIATZCvNb2cr5ezSdXl45IBQKIQAKSqC2Fjxftvhnyt6eV8vZxNqihfpa8hAOgtVZ8hAOghFAKARCEASBQCgEQhAEj/B2LoAC5+9sIHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "X_train shape: (60000, 784)\n", "Y_train shape: (60000, 10)\n", "\n", "60000 train samples\n", "10000 test samples\n" ] } ], "source": [ "from keras.datasets import mnist\n", "\n", "# input image dimensions\n", "num_classes = 10 # 10 digits\n", "\n", "img_rows, img_cols = 28, 28 # number of pixels \n", "\n", "# the data, shuffled and split between train and test sets\n", "(X_train, Y_train), (X_test, Y_test) = mnist.load_data()\n", "\n", "# reshape data, depending on Keras backend\n", "X_train = X_train.reshape(X_train.shape[0], img_rows*img_cols)\n", "X_test = X_test.reshape(X_test.shape[0], img_rows*img_cols)\n", " \n", "# cast floats to single precesion\n", "X_train = X_train.astype('float32')\n", "X_test = X_test.astype('float32')\n", "\n", "# rescale data in interval [0,1]\n", "X_train /= 255\n", "X_test /= 255\n", "\n", "# look at an example of data point\n", "print('an example of a data point with label', Y_train[20])\n", "plt.matshow(X_train[20,:].reshape(28,28),cmap='binary')\n", "plt.show()\n", "\n", "# convert class vectors to binary class matrices\n", "Y_train = keras.utils.to_categorical(Y_train, num_classes)\n", "Y_test = keras.utils.to_categorical(Y_test, num_classes)\n", "\n", "print('X_train shape:', X_train.shape)\n", "print('Y_train shape:', Y_train.shape)\n", "print()\n", "print(X_train.shape[0], 'train samples')\n", "print(X_test.shape[0], 'test samples')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Define the Neural Net and its Architecture\n", "\n", "We can now move on to construct our deep neural net. We shall use Keras's `Sequential()` class to instantiate a model, and will add different deep layers one by one.\n", "\n", "At this stage, we refrain from using convolutional layers. This is done further below.\n", "\n", "Let us create an instance of Keras' `Sequential()` class, called `model`. As the name suggests, this class allows us to build DNNs layer by layer. We use the `add()` method to attach layers to our model. For the purposes of our introductory example, it suffices to focus on `Dense` layers for simplicity. Every `Dense()` layer accepts as its first required argument an integer which specifies the number of neurons. The type of activation function for the layer is defined using the `activation` optional argument, the input of which is the name of the activation function in `string` format. Examples include `relu`, `tanh`, `elu`, `sigmoid`, `softmax`. \n", "\n", "In order for our DNN to work properly, we have to make sure that the numbers of input and output neurons for each layer match. Therefore, we specify the shape of the input in the first layer of the model explicitly using the optional argument `input_shape=(N_features,)`. The sequential construction of the model then allows Keras to infer the correct input/output dimensions of all hidden layers automatically. Hence, we only need to specify the size of the softmax output layer to match the number of categories." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model architecture created successfully!\n" ] } ], "source": [ "from keras.models import Sequential\n", "from keras.layers import Dense, Dropout, Flatten\n", "from keras.layers import Conv2D, MaxPooling2D\n", "\n", "\n", "def create_DNN():\n", " # instantiate model\n", " model = Sequential()\n", " # add a dense all-to-all relu layer\n", " model.add(Dense(400,input_shape=(img_rows*img_cols,), activation='relu'))\n", " # add a dense all-to-all relu layer\n", " model.add(Dense(100, activation='relu'))\n", " # apply dropout with rate 0.5\n", " model.add(Dropout(0.5))\n", " # soft-max layer\n", " model.add(Dense(num_classes, activation='softmax'))\n", " \n", " return model\n", "\n", "print('Model architecture created successfully!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Choose the Optimizer and the Cost Function\n", "\n", "Next, we choose the loss function according to which to train the DNN. For classification problems, this is the cross entropy, and since the output data was cast in categorical form, we choose the `categorical_crossentropy` defined in Keras' `losses` module. Depending on the problem of interest one can pick any other suitable loss function. To optimize the weights of the net, we choose SGD. This algorithm is already available to use under Keras' `optimizers` module, but we could use `Adam()` or any other built-in one as well. The parameters for the optimizer, such as `lr` (learning rate) or `momentum` are passed using the corresponding optional arguments of the `SGD()` function. All available arguments can be found in Keras' online documentation at [https://keras.io/](https://keras.io/). While the loss function and the optimizer are essential for the training procedure, to test the performance of the model one may want to look at a particular `metric` of performance. For instance, in categorical tasks one typically looks at their `accuracy`, which is defined as the percentage of correctly classified data points. To complete the definition of our model, we use the `compile()` method, with optional arguments for the `optimizer`, `loss`, and the validation `metric` as follows:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model compiled successfully and ready to be trained.\n" ] } ], "source": [ "def compile_model(optimizer=keras.optimizers.Adam()):\n", " # create the mode\n", " model=create_DNN()\n", " # compile the model\n", " model.compile(loss=keras.losses.categorical_crossentropy,\n", " optimizer=optimizer,\n", " metrics=['accuracy'])\n", " return model\n", "\n", "print('Model compiled successfully and ready to be trained.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Train the model\n", "\n", "We train our DNN in minibatches, the advantages of which were explained in Sec. IV. \n", "\n", "Shuffling the training data during training improves stability of the model. Thus, we train over a number of training epochs. \n", "\n", "Training the DNN is a one-liner using the `fit()` method of the `Sequential` class. The first two required arguments are the training input and output data. As optional arguments, we specify the mini-`batch_size`, the number of training `epochs`, and the test or `validation_data`. To monitor the training procedure for every epoch, we set `verbose=True`. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/10\n", "60000/60000 [==============================] - 12s 207us/step - loss: 0.3138 - acc: 0.9072 - val_loss: 0.1330 - val_acc: 0.9578\n", "Epoch 2/10\n", "60000/60000 [==============================] - 12s 192us/step - loss: 0.1285 - acc: 0.9627 - val_loss: 0.0942 - val_acc: 0.9711\n", "Epoch 3/10\n", "60000/60000 [==============================] - 12s 193us/step - loss: 0.0889 - acc: 0.9738 - val_loss: 0.0772 - val_acc: 0.9765\n", "Epoch 4/10\n", "60000/60000 [==============================] - 12s 194us/step - loss: 0.0696 - acc: 0.9793 - val_loss: 0.0737 - val_acc: 0.9787\n", "Epoch 5/10\n", "60000/60000 [==============================] - 12s 195us/step - loss: 0.0571 - acc: 0.9828 - val_loss: 0.0745 - val_acc: 0.9776\n", "Epoch 6/10\n", "60000/60000 [==============================] - 12s 198us/step - loss: 0.0452 - acc: 0.9858 - val_loss: 0.0799 - val_acc: 0.9782\n", "Epoch 7/10\n", "60000/60000 [==============================] - 12s 195us/step - loss: 0.0382 - acc: 0.9878 - val_loss: 0.0839 - val_acc: 0.9779\n", "Epoch 8/10\n", "60000/60000 [==============================] - 12s 197us/step - loss: 0.0337 - acc: 0.9892 - val_loss: 0.0781 - val_acc: 0.9792\n", "Epoch 9/10\n", "60000/60000 [==============================] - 12s 201us/step - loss: 0.0295 - acc: 0.9909 - val_loss: 0.0755 - val_acc: 0.9818\n", "Epoch 10/10\n", "60000/60000 [==============================] - 12s 199us/step - loss: 0.0257 - acc: 0.9921 - val_loss: 0.0863 - val_acc: 0.9792\n" ] } ], "source": [ "# training parameters\n", "batch_size = 64\n", "epochs = 10\n", "\n", "# create the deep neural net\n", "model_DNN=compile_model()\n", "\n", "# train DNN and store training info in history\n", "history=model_DNN.fit(X_train, Y_train,\n", " batch_size=batch_size,\n", " epochs=epochs,\n", " verbose=1,\n", " validation_data=(X_test, Y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5: Evaluate the Model Performance on the *Unseen* Test Data\n", "\n", "Next, we evaluate the model and read of the loss on the test data, and its accuracy using the `evaluate()` method." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10000/10000 [==============================] - 0s 36us/step\n", "\n", "Test loss: 0.07659573335081113\n", "Test accuracy: 0.9816\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# evaluate model\n", "score = model_DNN.evaluate(X_test, Y_test, verbose=1)\n", "\n", "# print performance\n", "print()\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])\n", "\n", "# look into training history\n", "\n", "# summarize history for accuracy\n", "plt.plot(history.history['acc'])\n", "plt.plot(history.history['val_acc'])\n", "plt.ylabel('model accuracy')\n", "plt.xlabel('epoch')\n", "plt.legend(['train', 'test'], loc='best')\n", "plt.show()\n", "\n", "# summarize history for loss\n", "plt.plot(history.history['loss'])\n", "plt.plot(history.history['val_loss'])\n", "plt.ylabel('model loss')\n", "plt.xlabel('epoch')\n", "plt.legend(['train', 'test'], loc='best')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Step 6: Modify the Hyperparameters to Optimize Performance of the Model\n", "\n", "Last, we show how to use the grid search option of scikit-learn to optimize the \n", "hyperparameters of our model. An excellent blog on this by Jason Brownlee can be found on [https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/](https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/1\n", "45000/45000 [==============================] - 5s 118us/step - loss: 1.1397 - acc: 0.6571\n", "15000/15000 [==============================] - 1s 38us/step\n", "45000/45000 [==============================] - 1s 23us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 5s 117us/step - loss: 1.1162 - acc: 0.6668\n", "15000/15000 [==============================] - 1s 39us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 5s 117us/step - loss: 1.1128 - acc: 0.6707\n", "15000/15000 [==============================] - 1s 38us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 5s 120us/step - loss: 1.1454 - acc: 0.6564\n", "15000/15000 [==============================] - 1s 42us/step\n", "45000/45000 [==============================] - 1s 24us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 177us/step - loss: 0.3399 - acc: 0.8988\n", "15000/15000 [==============================] - 1s 42us/step\n", "45000/45000 [==============================] - 1s 25us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 177us/step - loss: 0.3430 - acc: 0.8996\n", "15000/15000 [==============================] - 1s 44us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 183us/step - loss: 0.3414 - acc: 0.8996\n", "15000/15000 [==============================] - 1s 45us/step\n", "45000/45000 [==============================] - 1s 26us/step \n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 180us/step - loss: 0.3402 - acc: 0.8990\n", "15000/15000 [==============================] - 1s 45us/step\n", "45000/45000 [==============================] - 1s 28us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 169us/step - loss: 0.3175 - acc: 0.9076\n", "15000/15000 [==============================] - 1s 47us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 169us/step - loss: 0.3346 - acc: 0.9031\n", "15000/15000 [==============================] - 1s 51us/step\n", "45000/45000 [==============================] - 1s 27us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 170us/step - loss: 0.3227 - acc: 0.9070\n", "15000/15000 [==============================] - 1s 47us/step\n", "45000/45000 [==============================] - 1s 27us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 171us/step - loss: 0.3497 - acc: 0.8977\n", "15000/15000 [==============================] - 1s 52us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 10s 231us/step - loss: 0.3711 - acc: 0.8901\n", "15000/15000 [==============================] - 1s 52us/step\n", "45000/45000 [==============================] - 1s 26us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 10s 225us/step - loss: 0.3731 - acc: 0.8888\n", "15000/15000 [==============================] - 1s 45us/step\n", "45000/45000 [==============================] - 1s 22us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 10s 223us/step - loss: 0.3653 - acc: 0.8930\n", "15000/15000 [==============================] - 1s 48us/step\n", "45000/45000 [==============================] - 1s 23us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 9s 203us/step - loss: 0.3721 - acc: 0.8891\n", "15000/15000 [==============================] - 1s 44us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 167us/step - loss: 0.3551 - acc: 0.8951\n", "15000/15000 [==============================] - 1s 48us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 7s 166us/step - loss: 0.3561 - acc: 0.8954\n", "15000/15000 [==============================] - 1s 48us/step\n", "45000/45000 [==============================] - 1s 19us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 7s 166us/step - loss: 0.3496 - acc: 0.8962\n", "15000/15000 [==============================] - 1s 50us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 8s 171us/step - loss: 0.3621 - acc: 0.8937\n", "15000/15000 [==============================] - 1s 52us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 6s 140us/step - loss: 0.4106 - acc: 0.8811\n", "15000/15000 [==============================] - 1s 55us/step\n", "45000/45000 [==============================] - 1s 22us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 6s 143us/step - loss: 0.4125 - acc: 0.8780\n", "15000/15000 [==============================] - 1s 56us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 6s 142us/step - loss: 0.3998 - acc: 0.8839\n", "15000/15000 [==============================] - 1s 56us/step\n", "45000/45000 [==============================] - 1s 21us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 6s 142us/step - loss: 0.4139 - acc: 0.8779\n", "15000/15000 [==============================] - 1s 62us/step\n", "45000/45000 [==============================] - 1s 22us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 9s 206us/step - loss: 0.2943 - acc: 0.9140\n", "15000/15000 [==============================] - 1s 55us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 9s 206us/step - loss: 0.3047 - acc: 0.9111\n", "15000/15000 [==============================] - 1s 61us/step\n", "45000/45000 [==============================] - 1s 20us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 9s 208us/step - loss: 0.2920 - acc: 0.9137\n", "15000/15000 [==============================] - 1s 60us/step\n", "45000/45000 [==============================] - 1s 21us/step\n", "Epoch 1/1\n", "45000/45000 [==============================] - 9s 209us/step - loss: 0.3015 - acc: 0.9112\n", "15000/15000 [==============================] - 1s 62us/step\n", "45000/45000 [==============================] - 1s 22us/step\n", "Epoch 1/1\n", "60000/60000 [==============================] - 12s 199us/step - loss: 0.2694 - acc: 0.9210\n", "Best: 0.957033 using {'optimizer': 'Nadam'}\n", "0.874133 (0.005897) with: {'optimizer': 'SGD'}\n", "0.953867 (0.003900) with: {'optimizer': 'RMSprop'}\n", "0.951767 (0.001012) with: {'optimizer': 'Adagrad'}\n", "0.950500 (0.003029) with: {'optimizer': 'Adadelta'}\n", "0.955000 (0.002867) with: {'optimizer': 'Adam'}\n", "0.945917 (0.002984) with: {'optimizer': 'Adamax'}\n", "0.957033 (0.001874) with: {'optimizer': 'Nadam'}\n" ] } ], "source": [ "from sklearn.model_selection import GridSearchCV\n", "from keras.wrappers.scikit_learn import KerasClassifier\n", "\n", "# call Keras scikit wrapper\n", "model_gridsearch = KerasClassifier(build_fn=compile_model, \n", " epochs=1, \n", " batch_size=batch_size, \n", " verbose=1)\n", "\n", "# list of allowed optional arguments for the optimizer, see `compile_model()`\n", "optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']\n", "# define parameter dictionary\n", "param_grid = dict(optimizer=optimizer)\n", "# call scikit grid search module\n", "grid = GridSearchCV(estimator=model_gridsearch, param_grid=param_grid, n_jobs=1, cv=4)\n", "grid_result = grid.fit(X_train,Y_train)\n", "\n", "# summarize results\n", "print(\"Best: %f using %s\" % (grid_result.best_score_, grid_result.best_params_))\n", "means = grid_result.cv_results_['mean_test_score']\n", "stds = grid_result.cv_results_['std_test_score']\n", "params = grid_result.cv_results_['params']\n", "for mean, stdev, param in zip(means, stds, params):\n", " print(\"%f (%f) with: %r\" % (mean, stdev, param))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Convolutional Neural Nets with Keras\n", "\n", "We have so far considered each MNIST data sample as a $(28\\times 28,)$-long 1d vector. This approach neglects any spatial structure in the image. On the other hand, we do know that in every one of the hand-written digits there are *local* spatial correlations between the pixels, which we would like to take advantage of to improve the accuracy of our classification model. To this end, we first need to reshape the training and test input data as follows" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X_train shape: (60000, 28, 28, 1)\n", "Y_train shape: (60000, 10)\n", "\n", "60000 train samples\n", "10000 test samples\n" ] } ], "source": [ "# reshape data, depending on Keras backend\n", "if keras.backend.image_data_format() == 'channels_first':\n", " X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)\n", " X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)\n", " input_shape = (1, img_rows, img_cols)\n", "else:\n", " X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)\n", " X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)\n", " input_shape = (img_rows, img_cols, 1)\n", " \n", "print('X_train shape:', X_train.shape)\n", "print('Y_train shape:', Y_train.shape)\n", "print()\n", "print(X_train.shape[0], 'train samples')\n", "print(X_test.shape[0], 'test samples')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can ask the question of whether a neural net can learn to recognize such local patterns. As we saw in Sec. X of the review, this can be achieved by using convolutional layers. Luckily, all we need to do is change the architecture of our DNN, i.e. introduce small changes to the function `create_model()`. We can also merge **Step 2** and **Step 3** for convenience: " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def create_CNN():\n", " # instantiate model\n", " model = Sequential()\n", " # add first convolutional layer with 10 filters (dimensionality of output space)\n", " model.add(Conv2D(10, kernel_size=(5, 5),\n", " activation='relu',\n", " input_shape=input_shape))\n", " # add 2D pooling layer\n", " model.add(MaxPooling2D(pool_size=(2, 2)))\n", " # add second convolutional layer with 20 filters\n", " model.add(Conv2D(20, (5, 5), activation='relu'))\n", " # apply dropout with rate 0.5\n", " model.add(Dropout(0.5))\n", " # add 2D pooling layer\n", " model.add(MaxPooling2D(pool_size=(2, 2)))\n", " # flatten data\n", " model.add(Flatten())\n", " # add a dense all-to-all relu layer\n", " model.add(Dense(20*4*4, activation='relu'))\n", " # apply dropout with rate 0.5\n", " model.add(Dropout(0.5))\n", " # soft-max layer\n", " model.add(Dense(num_classes, activation='softmax'))\n", " \n", " # compile the model\n", " model.compile(loss=keras.losses.categorical_crossentropy,\n", " optimizer='Adam',\n", " metrics=['accuracy'])\n", " \n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Training the deep conv net (**Step 4**) and evaluating its performance (**Step 6**) proceeds exactly as before:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/10\n", "60000/60000 [==============================] - 39s 646us/step - loss: 0.2610 - acc: 0.9183 - val_loss: 0.0750 - val_acc: 0.9825\n", "Epoch 2/10\n", "60000/60000 [==============================] - 32s 533us/step - loss: 0.0932 - acc: 0.9709 - val_loss: 0.0551 - val_acc: 0.9876\n", "Epoch 3/10\n", "60000/60000 [==============================] - 32s 533us/step - loss: 0.0689 - acc: 0.9789 - val_loss: 0.0408 - val_acc: 0.9893\n", "Epoch 4/10\n", "60000/60000 [==============================] - 32s 533us/step - loss: 0.0622 - acc: 0.9808 - val_loss: 0.0378 - val_acc: 0.9907\n", "Epoch 5/10\n", "60000/60000 [==============================] - 32s 537us/step - loss: 0.0531 - acc: 0.9837 - val_loss: 0.0333 - val_acc: 0.9915\n", "Epoch 6/10\n", "60000/60000 [==============================] - 33s 554us/step - loss: 0.0495 - acc: 0.9854 - val_loss: 0.0339 - val_acc: 0.9909\n", "Epoch 7/10\n", "60000/60000 [==============================] - 39s 649us/step - loss: 0.0475 - acc: 0.9851 - val_loss: 0.0330 - val_acc: 0.9929\n", "Epoch 8/10\n", "60000/60000 [==============================] - 39s 648us/step - loss: 0.0423 - acc: 0.9870 - val_loss: 0.0289 - val_acc: 0.9931\n", "Epoch 9/10\n", "60000/60000 [==============================] - 38s 633us/step - loss: 0.0421 - acc: 0.9874 - val_loss: 0.0290 - val_acc: 0.9928\n", "Epoch 10/10\n", "60000/60000 [==============================] - 39s 646us/step - loss: 0.0374 - acc: 0.9884 - val_loss: 0.0253 - val_acc: 0.9922\n", "10000/10000 [==============================] - 3s 288us/step\n", "\n", "Test loss: 0.025273755778139458\n", "Test accuracy: 0.9922\n" ] } ], "source": [ "# training parameters\n", "batch_size = 64\n", "epochs = 10\n", "\n", "# create the deep conv net\n", "model_CNN=create_CNN()\n", "\n", "# train CNN\n", "model_CNN.fit(X_train, Y_train,\n", " batch_size=batch_size,\n", " epochs=epochs,\n", " verbose=1,\n", " validation_data=(X_test, Y_test))\n", "\n", "# evaliate model\n", "score = model_CNN.evaluate(X_test, Y_test, verbose=1)\n", "\n", "# print performance\n", "print()\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }