A high bias low-variance introduction to Machine Learning for physicists.


 


This website contains Python notebooks that accompany our review entitled A high-bias, low-variance introduction to Machine Learning for physicists. An updated version of the review can be downloaded from the arxiv at arXiv:1803.08823.

The authors of the review are Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre Day, Clint Richardson, Charles Fisher, David Schwab. Please help improve the manuscript. Feel free to submit comments, suggestions, and typos here.


Datasets: Most of the examples in the notebooks use the three datasets described below. Details on the datasets can be found in the Appendix of the review.

  • MNIST. MNIST is a dataset of of handwritten numerical characters. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples
  • SUSY datset The SUSY dataset consists of Monte-Carlo simulations of supersymmetric and non-supersymmetric collision events. More informaton about the dataset can be found in the accompanying paper.
  • Nearest Neighbor Ising Model. This dataset consists of samples generated from the two-dimensional nearest-neighbor coupled Ising model at a range of temperatures above and below the critical point. The dataset can be downloaded here. More information about the dataset can be found in the appendix of the accompanying review.


Python Information: It is recommended that users use Python 3.6 or above (though most notebooks will work with any version of Python 3). Notebooks contain instructions for installing and downloading appropriate packages.


Information about notebooks: There are are a total of 20 notebooks that accompany the review. Most of these notebooks are new. However, others (mostly those based on the MNIST dataset) are modified versions of notebooks/tutorials developed by the makers of commonly used machine learning packages such as Keras, PyTorch, scikit learn, TensorFlow, as well as a new package Paysage for energy-based generative model maintained by Unlearn.AI. All the notebooks make generous use of code from these tutorials as well the rich ecosystem of publically available blog posts on Machine Learning by researchers, practioners, and students. We have included links to all relevant sources within each notebook. For full disclosure, we note that Unlearn.AI is affiliated with two of the authors Charles Fisher (founder) and Pankaj Mehta (scientific advisor).

The notebooks are named according to the convention NB#_CXX-description.ipynb where CXX refers to the corresponding section in the review (e.g. a notebook for Section VII about Random Forests will have a name of the form NB_CVII-Random_Forests.ipynb).

Notebooks

A zip file containing all notebooks can be downloaded here. Individual notebooks can be downloaded below. We also include links to html versions of the notebook. FOR LATEST VERSION OF NOTEBOOKS PLEASE CONSULT THE GITHUB SITE HERE.

  1. Section II: Machine Learning is difficult   python   html
  2. Section IV: Gradient Descent   python   html
  3. Section VI: Linear Regression (Diabetes)   python   html
  4. Section VI: Linear Regression (Ising)   python   html
  5. Section VII: Logistic Regression (SUSY)   python   html
  6. Section VII: Logistic Regression (Ising)   python   html
  7. Section VII: Logistic Regression (MNIST)   python   html
  8. Section VII: Bagging   python   html
  9. Section VIII: Random Forests (Ising)python   html
  10. Section VIII: XGBoost (SUSY)   python   html
  11. Section IX: Keras DNN (MNIST)   python   html
  12. Section IX: TensorFlow DNN (Ising)   python   html
  13. Section IX: Pytorch DNN (SUSY)   python   html
  14. Section X: Pytorch CNN (Ising)   python   html
  15. Section XII: Clustering   python   html
  16. Section XIV: Expectation Maximization   python   html
  17. Section XVI: Restricted Boltzmann Machines (MNIST)  python   html
  18. Section XVI: Restricted Boltzmann Machines (Ising)   python   html
  19. Section XVII: Keras Variational Autoencoders (MNIST)   python   html
  20. Section XVII: Keras Variational Autoencoders (Ising)   python   html