Add tutorial on speeding up code

aragilar · aragilar · commit c33a50482650 · 2016-12-22T11:47:07.000+11:00
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.ipynb_checkpoints
diff --git a/README.md b/README.md
@@ -5,3 +5,4 @@ Coffee and Code is a fortnightly meetup of (mainly) astronomers from the Physics
 This repository contains resources we have created or used, that are not part of a specific series we did. Links to specific topics and areas are below.
 
 ## Topics
+ * [Speeding up code](SpeedingUpCode/README.md)
diff --git a/SpeedingUpCode/BenchmarkingTutorial.ipynb b/SpeedingUpCode/BenchmarkingTutorial.ipynb
@@ -0,0 +1,222 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# How to find why your code is slow\n",
+    "\n",
+    "We're going to look at 4 ways to find why you code is slow.\n",
+    "\n",
+    "This will use the jupyter notebook (see the [ipython tutorial](http://nbviewer.ipython.org/github/dboyliao/cookbook-code/blob/master/notebooks/chapter01_basic/01_notebook.ipynb) for a basic into to jupyter), but you can do this on the command line or in python scripts. I would recommend using the jupyter interface for profiling though, as it's quite nice.\n",
+    "\n",
+    "Note that while `%time`, `%timeit` and `%prun` come with jupyter by default (as they only depend on the standard library), `%lprun` is not, so you'll need to install it manually.\n",
+    "\n",
+    "Some useful links are:\n",
+    " * [Python docs on debugging](https://docs.python.org/2/library/debug.html)\n",
+    " * [Python Module of the Week examples on the profilers](http://pymotw.com/2/profilers.html)\n",
+    " * [Tips on optimsing code by the `scikit-learn` developers](http://scikit-learn.org/dev/developers/performance.html#profiling-python-code)\n",
+    "\n",
+    "These examples use `numpy`, `scipy`, and `line-profiler`, install these using `pip`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import, division, print_function # Py2/3 compat"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `%time`\n",
+    "\n",
+    "`%time` is a ipython magic, used to measure how long a bunch of python code takes to run. It's similar to the unix command/shell builtin `time`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 772 ms, sys: 12 ms, total: 784 ms\n",
+      "Wall time: 120 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "import numpy as np\n",
+    "length = 100000\n",
+    "np.zeros(length) / np.ones(length)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `%timeit`\n",
+    "\n",
+    "`timeit` is a python module in the standard library for timing python code. What it does is run the code in a loop multiple time, and pick the time of the best loop (see the [timeit docs](https://docs.python.org/2/library/timeit.html) for why this is done). `timeit` can be run either on the command line using `python -m timeit` or called inside python, but the easiest way is to use the `%timeit` ipython magic. See `%timeit?` for the additional arguments you can give `%timeit` when you run it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you can see from the example below, not everything that performs the same task takes the same amount of time. If you're computing `sin(x)` inside a loop which is running many times, if you aren't calling it on a numpy array, maybe you should use `math.sin`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "math:\n",
+      "The slowest run took 60.13 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+      "10000000 loops, best of 3: 128 ns per loop\n",
+      "numpy:\n",
+      "The slowest run took 21.05 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+      "1000000 loops, best of 3: 888 ns per loop\n",
+      "scipy:\n",
+      "The slowest run took 25.55 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
+      "1000000 loops, best of 3: 887 ns per loop\n"
+     ]
+    }
+   ],
+   "source": [
+    "from math import sin as msin\n",
+    "from numpy import sin as npsin\n",
+    "from scipy import sin as spsin\n",
+    "\n",
+    "from math import pi\n",
+    "\n",
+    "angle = pi - 0.1\n",
+    "\n",
+    "print(\"math:\")\n",
+    "%timeit msin(angle)\n",
+    "print(\"numpy:\")\n",
+    "%timeit npsin(angle)\n",
+    "print(\"scipy:\")\n",
+    "%timeit spsin(angle)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `%prun` a.k.a. the python profiler\n",
+    "\n",
+    "Python comes with up to 3 different profilers, however for most purposes using `cProfile` is sufficient. Like `timeit`, this can be called in a script or from the command line, but it's easier to use `ipython`. The `ipython` magic you want to use is `%prun`. Like `%timeit` it's worth looking at its help.\n",
+    "\n",
+    "`ipython` can also profile scripts via `%run -p script.py`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " "
+     ]
+    }
+   ],
+   "source": [
+    "%%prun\n",
+    "\n",
+    "def fast_func():\n",
+    "    return 1\n",
+    "\n",
+    "def slow_func():\n",
+    "    for i in range(100000):\n",
+    "        i**2\n",
+    "\n",
+    "fast_func()\n",
+    "slow_func()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It's really easy to start profiling your current scripts in ipython (as long as you put all the top level code in a function called something like `main`), all you need to do is create a notebook in the same directory as your scripts, and then run the code:\n",
+    "```ipython\n",
+    "from your_script import main\n",
+    "%prun main()\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `%lprun`: `line_profiler`\n",
+    "\n",
+    "`line_profiler` is \n",
+    "\n",
+    "PYPI: https://pypi.python.org/pypi/line_profiler/\n",
+    "Github: https://github.com/rkern/line_profiler\n",
+    "\n",
+    "Install using pip, i.e.:\n",
+    "```sh\n",
+    "pip install line_profiler\n",
+    "```\n",
+    "and follow instructions in README.rst to add to ipython"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.2+"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/SpeedingUpCode/README.md b/SpeedingUpCode/README.md
@@ -0,0 +1,3 @@
+# Speeding up code - 04 Dec 2014 - James Tocknell
+
+See the [jupyter notebook](BenchmarkingTutorial.ipynb)

Original file line number	Diff line number	Diff line change
`@@ -5,3 +5,4 @@ Coffee and Code is a fortnightly meetup of (mainly) astronomers from the Physics`
`5`	`5`	`This repository contains resources we have created or used, that are not part of a specific series we did. Links to specific topics and areas are below.`
`6`	`6`
`7`	`7`	`## Topics`
	`8`	`+ * [Speeding up code](SpeedingUpCode/README.md)`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Speeding up code - 04 Dec 2014 - James Tocknell`
	`2`	`+`
	`3`	`+See the [jupyter notebook](BenchmarkingTutorial.ipynb)`