{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook 2: Gradient Descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goal\n", "\n", "The goal of this notebook is to gain intuition for various gradient descent methods by visualizing and applying these methods to some simple two-dimensional surfaces. Methods studied include ordinary gradient descent, gradient descent with momentum, NAG, ADAM, and RMSProp.\n", "\n", "\n", "## Overview\n", "\n", "In this notebook, we will visualize what different gradient descent methods are doing using some simple surfaces. From the onset, we emphasize that doing gradient descent on the surfaces is different from performing gradient descent on a loss function in Machine Learning (ML). The reason is that in ML not only do we want to find good minima, we want to find good minima that generalize well to new data. Despite this crucial difference, we can still build intuition about gradient descent methods by applying them to simple surfaces (see related blog posts [here](http://ruder.io/optimizing-gradient-descent/) and [here](http://tiao.io/notes/visualizing-and-animating-optimization-algorithms-with-matplotlib/)).\n", "\n", "## Surfaces\n", "\n", "We will consider three simple surfaces: a quadratic minimum of the form $$z=ax^2+by^2,$$ a saddle-point of the form $$z=ax^2-by^2,$$ and [Beale's Function](https://en.wikipedia.org/wiki/Test_functions_for_optimization), a convex function often used to test optimization problems of the form:\n", "$$z(x,y) = (1.5-x+xy)^2+(2.25-x+xy^2)^2+(2.625-x+xy^3)^2$$\n", "\n", "These surfaces can be plotted using the cells below. \n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#This cell sets up basic plotting functions awe\n", "#we will use to visualize the gradient descent routines.\n", "\n", "#Make plots interactive\n", "#%matplotlib notebook\n", "\n", "#Make plots static\n", "%matplotlib inline\n", "\n", "#Make 3D plots\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import matplotlib.pyplot as plt\n", "from matplotlib import cm\n", "#from matplotlib import animation\n", "from IPython.display import HTML\n", "from matplotlib.colors import LogNorm\n", "#from itertools import zip_longest\n", "\n", "#Import Numpy\n", "import numpy as np\n", "\n", "#Define function for plotting \n", "\n", "def plot_surface(x, y, z, azim=-60, elev=40, dist=10, cmap=\"RdYlBu_r\"):\n", "\n", " fig = plt.figure()\n", " ax = fig.add_subplot(111, projection='3d')\n", " plot_args = {'rstride': 1, 'cstride': 1, 'cmap':cmap,\n", " 'linewidth': 20, 'antialiased': True,\n", " 'vmin': -2, 'vmax': 2}\n", " ax.plot_surface(x, y, z, **plot_args)\n", " ax.view_init(azim=azim, elev=elev)\n", " ax.dist=dist\n", " ax.set_xlim(-1, 1)\n", " ax.set_ylim(-1, 1)\n", " ax.set_zlim(-2, 2)\n", " \n", " plt.xticks([-1, -0.5, 0, 0.5, 1], [\"-1\", \"-1/2\", \"0\", \"1/2\", \"1\"])\n", " plt.yticks([-1, -0.5, 0, 0.5, 1], [\"-1\", \"-1/2\", \"0\", \"1/2\", \"1\"])\n", " ax.set_zticks([-2, -1, 0, 1, 2])\n", " ax.set_zticklabels([\"-2\", \"-1\", \"0\", \"1\", \"2\"])\n", " \n", " ax.set_xlabel(\"x\", fontsize=18)\n", " ax.set_ylabel(\"y\", fontsize=18)\n", " ax.set_zlabel(\"z\", fontsize=18)\n", " return fig, ax;\n", "\n", "def overlay_trajectory_quiver(ax,obj_func,trajectory, color='k'):\n", " xs=trajectory[:,0]\n", " ys=trajectory[:,1]\n", " zs=obj_func(xs,ys)\n", " ax.quiver(xs[:-1], ys[:-1], zs[:-1], xs[1:]-xs[:-1], ys[1:]-ys[:-1],zs[1:]-zs[:-1],color=color,arrow_length_ratio=0.3)\n", " \n", " return ax;\n", "\n", "def overlay_trajectory(ax,obj_func,trajectory,label,color='k'):\n", " xs=trajectory[:,0]\n", " ys=trajectory[:,1]\n", " zs=obj_func(xs,ys)\n", " ax.plot(xs,ys,zs, color, label=label)\n", " \n", " return ax;\n", "\n", " \n", "def overlay_trajectory_contour_M(ax,trajectory, label,color='k',lw=2):\n", " xs=trajectory[:,0]\n", " ys=trajectory[:,1]\n", " ax.plot(xs,ys, color, label=label,lw=lw)\n", " ax.plot(xs[-1],ys[-1],color+'>', markersize=14)\n", " return ax;\n", "\n", "def overlay_trajectory_contour(ax,trajectory, label,color='k',lw=2):\n", " xs=trajectory[:,0]\n", " ys=trajectory[:,1]\n", " ax.plot(xs,ys, color, label=label,lw=lw)\n", " return ax;" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": 