{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook 7: Logistic Regression and SoftMax for MNIST" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goal\n", "\n", "The goal of this notebook is to familiarize the reader with SoftMax regression (a generalization of logistic regression to more than two categories), categorical predictions, and the MNIST handwritten dataset. The reader will understand how to use the Scikit Logistic regression package and visualize learned weights.\n", "\n", "## Overview\n", "### The MNIST dataset:\n", "The MNIST classification problem is one of the classical ML problems for learning classification on high-dimensional data with a fairly sizable number of examples (60000). Yann LeCun and collaborators collected and processed $70000$ handwritten digits (60000 are used for training and 10000 for testing) to produce what became known as one of the most widely used datasets in ML: the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Each handwritten digit comes in a grayscale square image in the shape of a $28\\times 28$ pixel grid. Every pixel takes a value in the range $[0,255]$, representing $256$ nuances of the gray color. The problem of image classification finds applications in a wide range of fields and is important for numerous industry applications of ML. \n", "### SoftMax regression:\n", "We will use SoftMax regression, which can be thought of as a statistical model which assigns a probability that a given input image corresponds to any of the 10 handwritten digits. The model is a generalization of the logistic regression and reads:\n", "\\begin{align}\n", "p(y=i|\\boldsymbol{x};W) = \\frac{e^{\\boldsymbol{w}_i^T \\boldsymbol{x}}}{\\sum_{j=0}^9 e^{\\boldsymbol{w}_j^T}},\n", "\\end{align}\n", "Where $p(y=i|\\boldsymbol{x};W)$ is the probability that input $\\boldsymbol{x}$ is the $i$-th digit, $i\\in[0,9]$.\n", "The model also has 10 weight vectors $\\boldsymbol{w}_i$ which we will train below. Finally, one can use this information for prediction by taking the value of $y$ for which this probability is maximized:\n", "\\begin{align}\n", "y_{pred}=\\arg\\max_i p(y=i|\\boldsymbol{x})\n", "\\end{align}\n", "\n", "## Numerical Experiments\n", "\n", "The reader is invited to check out the code below to build up their intuition about SoftMax regression. The following notebook is a slight modification of [this Scikit tutorial](http://scikit-learn.org/dev/auto_examples/linear_model/plot_sparse_logistic_regression_mnist.html) by Arthur Mensch on studying the MNIST problem using Logistic Regression." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Automatically created module for IPython interactive environment\n", "Example run in 47.833 s\n", "Sparsity with L2 penalty: 9.18%\n", "Test score with L2 penalty: 0.8948\n" ] } ], "source": [ "import time\n", "import numpy as np\n", "\n", "from sklearn.datasets import fetch_openml # MNIST data\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.utils import check_random_state\n", "\n", "print(__doc__)\n", "\n", "# Turn down for faster convergence\n", "t0 = time.time()\n", "train_size = 50000\n", "test_size = 10000\n", "\n", "### load MNIST data from https://www.openml.org/d/554\n", "X, y = fetch_openml('mnist_784', version=1, return_X_y=True)\n", "\n", "# shuffle data\n", "random_state = check_random_state(0)\n", "permutation = random_state.permutation(X.shape[0])\n", "X = X[permutation]\n", "y = y[permutation]\n", "X = X.reshape((X.shape[0], -1))\n", "\n", "# pick training and test data sets \n", "X_train, X_test, y_train, y_test = train_test_split(X,y,train_size=train_size,test_size=test_size)\n", "\n", "# scale data to have zero mean and unit variance [required by regressor]\n", "scaler = StandardScaler()\n", "X_train = scaler.fit_transform(X_train)\n", "X_test = scaler.transform(X_test)\n", "\n", "# apply logistic regressor with 'sag' solver, C is the inverse regularization strength\n", "clf = LogisticRegression(C=1e5,\n", " multi_class='multinomial',\n", " penalty='l2', solver='sag', tol=0.1)\n", "# fit data\n", "clf.fit(X_train, y_train)\n", "# percentage of nonzero weights\n", "sparsity = np.mean(clf.coef_ == 0) * 100\n", "# compute accuracy\n", "score = clf.score(X_test, y_test)\n", "\n", "#display run time\n", "run_time = time.time() - t0\n", "print('Example run in %.3f s' % run_time)\n", "\n", "print(\"Sparsity with L2 penalty: %.2f%%\" % sparsity)\n", "print(\"Test score with L2 penalty: %.4f\" % score)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "