Skip to content

Commit c33a504

Browse files
committed
Add tutorial on speeding up code
1 parent 3c990ba commit c33a504

File tree

4 files changed

+227
-0
lines changed

4 files changed

+227
-0
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.ipynb_checkpoints

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ Coffee and Code is a fortnightly meetup of (mainly) astronomers from the Physics
55
This repository contains resources we have created or used, that are not part of a specific series we did. Links to specific topics and areas are below.
66

77
## Topics
8+
* [Speeding up code](SpeedingUpCode/README.md)
+222
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# How to find why your code is slow\n",
8+
"\n",
9+
"We're going to look at 4 ways to find why you code is slow.\n",
10+
"\n",
11+
"This will use the jupyter notebook (see the [ipython tutorial](http://nbviewer.ipython.org/github/dboyliao/cookbook-code/blob/master/notebooks/chapter01_basic/01_notebook.ipynb) for a basic into to jupyter), but you can do this on the command line or in python scripts. I would recommend using the jupyter interface for profiling though, as it's quite nice.\n",
12+
"\n",
13+
"Note that while `%time`, `%timeit` and `%prun` come with jupyter by default (as they only depend on the standard library), `%lprun` is not, so you'll need to install it manually.\n",
14+
"\n",
15+
"Some useful links are:\n",
16+
" * [Python docs on debugging](https://docs.python.org/2/library/debug.html)\n",
17+
" * [Python Module of the Week examples on the profilers](http://pymotw.com/2/profilers.html)\n",
18+
" * [Tips on optimsing code by the `scikit-learn` developers](http://scikit-learn.org/dev/developers/performance.html#profiling-python-code)\n",
19+
"\n",
20+
"These examples use `numpy`, `scipy`, and `line-profiler`, install these using `pip`."
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 1,
26+
"metadata": {
27+
"collapsed": true
28+
},
29+
"outputs": [],
30+
"source": [
31+
"from __future__ import absolute_import, division, print_function # Py2/3 compat"
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"## `%time`\n",
39+
"\n",
40+
"`%time` is a ipython magic, used to measure how long a bunch of python code takes to run. It's similar to the unix command/shell builtin `time`."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": 2,
46+
"metadata": {
47+
"collapsed": false
48+
},
49+
"outputs": [
50+
{
51+
"name": "stdout",
52+
"output_type": "stream",
53+
"text": [
54+
"CPU times: user 772 ms, sys: 12 ms, total: 784 ms\n",
55+
"Wall time: 120 ms\n"
56+
]
57+
}
58+
],
59+
"source": [
60+
"%%time\n",
61+
"import numpy as np\n",
62+
"length = 100000\n",
63+
"np.zeros(length) / np.ones(length)"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"metadata": {},
69+
"source": [
70+
"## `%timeit`\n",
71+
"\n",
72+
"`timeit` is a python module in the standard library for timing python code. What it does is run the code in a loop multiple time, and pick the time of the best loop (see the [timeit docs](https://docs.python.org/2/library/timeit.html) for why this is done). `timeit` can be run either on the command line using `python -m timeit` or called inside python, but the easiest way is to use the `%timeit` ipython magic. See `%timeit?` for the additional arguments you can give `%timeit` when you run it."
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"As you can see from the example below, not everything that performs the same task takes the same amount of time. If you're computing `sin(x)` inside a loop which is running many times, if you aren't calling it on a numpy array, maybe you should use `math.sin`."
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": 3,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [
89+
{
90+
"name": "stdout",
91+
"output_type": "stream",
92+
"text": [
93+
"math:\n",
94+
"The slowest run took 60.13 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
95+
"10000000 loops, best of 3: 128 ns per loop\n",
96+
"numpy:\n",
97+
"The slowest run took 21.05 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
98+
"1000000 loops, best of 3: 888 ns per loop\n",
99+
"scipy:\n",
100+
"The slowest run took 25.55 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
101+
"1000000 loops, best of 3: 887 ns per loop\n"
102+
]
103+
}
104+
],
105+
"source": [
106+
"from math import sin as msin\n",
107+
"from numpy import sin as npsin\n",
108+
"from scipy import sin as spsin\n",
109+
"\n",
110+
"from math import pi\n",
111+
"\n",
112+
"angle = pi - 0.1\n",
113+
"\n",
114+
"print(\"math:\")\n",
115+
"%timeit msin(angle)\n",
116+
"print(\"numpy:\")\n",
117+
"%timeit npsin(angle)\n",
118+
"print(\"scipy:\")\n",
119+
"%timeit spsin(angle)"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"metadata": {},
125+
"source": [
126+
"## `%prun` a.k.a. the python profiler\n",
127+
"\n",
128+
"Python comes with up to 3 different profilers, however for most purposes using `cProfile` is sufficient. Like `timeit`, this can be called in a script or from the command line, but it's easier to use `ipython`. The `ipython` magic you want to use is `%prun`. Like `%timeit` it's worth looking at its help.\n",
129+
"\n",
130+
"`ipython` can also profile scripts via `%run -p script.py`."
131+
]
132+
},
133+
{
134+
"cell_type": "code",
135+
"execution_count": 4,
136+
"metadata": {
137+
"collapsed": false
138+
},
139+
"outputs": [
140+
{
141+
"name": "stdout",
142+
"output_type": "stream",
143+
"text": [
144+
" "
145+
]
146+
}
147+
],
148+
"source": [
149+
"%%prun\n",
150+
"\n",
151+
"def fast_func():\n",
152+
" return 1\n",
153+
"\n",
154+
"def slow_func():\n",
155+
" for i in range(100000):\n",
156+
" i**2\n",
157+
"\n",
158+
"fast_func()\n",
159+
"slow_func()"
160+
]
161+
},
162+
{
163+
"cell_type": "markdown",
164+
"metadata": {},
165+
"source": [
166+
"It's really easy to start profiling your current scripts in ipython (as long as you put all the top level code in a function called something like `main`), all you need to do is create a notebook in the same directory as your scripts, and then run the code:\n",
167+
"```ipython\n",
168+
"from your_script import main\n",
169+
"%prun main()\n",
170+
"```"
171+
]
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"metadata": {},
176+
"source": [
177+
"## `%lprun`: `line_profiler`\n",
178+
"\n",
179+
"`line_profiler` is \n",
180+
"\n",
181+
"PYPI: https://pypi.python.org/pypi/line_profiler/\n",
182+
"Github: https://github.com/rkern/line_profiler\n",
183+
"\n",
184+
"Install using pip, i.e.:\n",
185+
"```sh\n",
186+
"pip install line_profiler\n",
187+
"```\n",
188+
"and follow instructions in README.rst to add to ipython"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": null,
194+
"metadata": {
195+
"collapsed": false
196+
},
197+
"outputs": [],
198+
"source": []
199+
}
200+
],
201+
"metadata": {
202+
"kernelspec": {
203+
"display_name": "Python 3",
204+
"language": "python",
205+
"name": "python3"
206+
},
207+
"language_info": {
208+
"codemirror_mode": {
209+
"name": "ipython",
210+
"version": 3
211+
},
212+
"file_extension": ".py",
213+
"mimetype": "text/x-python",
214+
"name": "python",
215+
"nbconvert_exporter": "python",
216+
"pygments_lexer": "ipython3",
217+
"version": "3.5.2+"
218+
}
219+
},
220+
"nbformat": 4,
221+
"nbformat_minor": 0
222+
}

SpeedingUpCode/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Speeding up code - 04 Dec 2014 - James Tocknell
2+
3+
See the [jupyter notebook](BenchmarkingTutorial.ipynb)

0 commit comments

Comments
 (0)