Skip to content

Going with Python

François Briatte edited this page Nov 25, 2024 · 16 revisions

As a useR, you will at least hear about Python from time to time, and sometimes (not always) with good reason: there are things that are worth being done in Python rather than in R. Here is a list of resources to get you started. (It does not cover setting up your Python environment).

For an overview of functions that achieve the same thing as the tidyverse ones, see this conversion guide.

If you need a very short list of Python libraries to focus on, there you go:

  • pandas for data wrangling
  • numpy for all sorts of mathematical operations
  • matplotlib and seaborn for plots
  • statsmodels and linearmodels for statistical models
  • sklearn for machine learning algorithms

Currently missing from the list below:

  • An introduction to Web scraping with BeautifulSoup, lxml, requests etc.
  • A clear separation between the free-to-read resources and the rest

Handbooks

Ani Adhikari, John DeNero and David Wagner, Computational and Inferential Thinking: The Foundations of Data Science, 2022

Textbook for the Data 8: Foundations of Data Science course at UC Berkeley. Use both the course and the textbook as a starting point for 'Data 100' course (and textbook) mentioned below on this page.

Allen B. Downey, Elements of Data Science, 2022

Quoting from the homepage: "an introduction to data science for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work in data science as quickly as possible."

Allen B. Downey, Think Stats. Exploratory Data Analysis in Python, 2014

Free book. Almost ten years old, but still useful.

Dirk Hovy, Text Analysis in Python for Social Scientists. Prediction and Classification, 2022

A recent book that should help with the specific areas of text mining and text analysis with Python, although I am adding it to this list without having had a chance to take a proper look at its actual contents.

Matheus Facure Alves, Causal Inference for The Brave and True, 2023

A book that covers linear models, panel data, diff-in-diff, regression discontinuity designs, and more, all in Python. A good guide to using the relevant packages that allow to use these methods, i.e. statsmodels and linearmodels. Not a book for those who want to learn Python, despite the fact that it contains lots of useful code to learn e.g. plots with matplotlib and seaborn.

Jacqueline Kazil and Katharine Jarmul, Data Wrangling with Python, 2016

A book that covers data import, data wrangling, and Web scraping using the Scrapy library, as well as APIs. The list of appendices looks very useful.

Sam Lau, Joey Gonzalez, and Deb Nolan, Learning Data Science, f. 2023

Forthcoming book based on the Principles and Techniques of Data Science ('Data 100') course at UC Berkeley. Assumes you know Python already, which you will, if you take the course, and before you do, also take the 'Data 8' course mentioned above.

Andreas C. Müller, Sarah Guido, Introduction to Machine Learning with Python, 2016

All the essentials, in just one book.

Sebastian Raschka and Vahid Mirjalili, Python Machine Learning, 2019

Extensive book. The first author also has a full course for you: Machine Learning (University Wisconsin-Madison, 2018).

Arthur Turrell et al., Python for Data Science, 2022

A Python equivalent of the same book for R. Free to read online, and like its R equivalent, pretty extensive.

Jake VanderPlas, Python Data Science Handbook, 2016

This book covers essential Python data science modules: NumPy, Pandas, Matplotlib, and machine learning with Scikit-learn. The only other module that you will actually need is statsmodels.

Tutorials

Tim Hopper, Python Plotting for Exploratory Data Analysis, 2020

Basic plots, using matplotlib or other libraries built on top of it.

Rafe Kettler, A Guide to Python's Magic Methods, 2012

Python has this __thing__ called magic methods. Check out how they work.

Lev Maximov, Pandas Illustrated: The Definitive Visual Guide to Pandas

Data wrangling with the Pandas package. If you are coming from R, see also Conor MM's R to Python [pandas] useful data wrangling snippets.

Ines Montani, Advanced NLP with spaCy (n.d.)

A full course on text mining and natural language processing with one of the best Python libraries around.

Guillaume Plique et al., minet: Web Mining Library and Command Line Tool Written in Python

A tool made by the médialab at Sciences Po, Paris. Check the Github repository for the full documentation.

Jake VanderPlas, A Whirlwind Tour of Python, 2016

This tutorial will show the Python language essentials. It is intended at people familiar with another language.

Courses

Tomas Beuzen, Python Programming for Data Science, 2021

The basics of Python and Pandas, with a few more things on NumPy.

DataCamp, Python Data Science Track, n.d.

All slides and code from a popular online learning platform.

Ethan Swan and Bradley Boehmke, Intro to Python for Data Science Workshop, c. 2022

Very easy to follow. View it as Binder notebooks, or follow the slides.

Stefan McCabe, Programming with Data

Harder to follow, but lists many interesting examples for replication.

Thomas J. Sargent and John Stachurski, Quantitative Economics with Python

Three courses for the price of one free.

Courses in French

Courses with certificates

The following list was sent by a student (thanks Urjasvi) who was looking for Python courses from providers that deliver completion certificates:

More might be available via e.g. DataCamp and edX.


Based on a few bookmarks dating back to 2017-03-15 (I did not dig into the older ones).

Clone this wiki locally