-
Notifications
You must be signed in to change notification settings - Fork 10
Going with Python
As a useR, you will at least hear about Python from time to time, and sometimes (not always) with good reason: there are things that are worth being done in Python rather than in R. Here is a list of resources to get you started. (It does not cover setting up your Python environment).
For an overview of functions that achieve the same thing as the tidyverse ones, see this conversion guide.
If you need a very short list of Python libraries to focus on, there you go:
-
pandas
for data wrangling -
numpy
for all sorts of mathematical operations -
matplotlib
andseaborn
for plots -
statsmodels
andlinearmodels
for statistical models-
pyfixest
for fixed-effects estimation - see py-econometrics for more
-
-
sklearn
for machine learning algorithms-
scikit-plot
for plots of the above
-
Currently missing from the list below:
- An introduction to Web scraping with BeautifulSoup, lxml, requests etc.
- A clear separation between the free-to-read resources and the rest
Ani Adhikari, John DeNero and David Wagner, Computational and Inferential Thinking: The Foundations of Data Science, 2022
Textbook for the Data 8: Foundations of Data Science course at UC Berkeley. Use both the course and the textbook as a starting point for 'Data 100' course (and textbook) mentioned below on this page.
Allen B. Downey, Elements of Data Science, 2022
Quoting from the homepage: "an introduction to data science for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work in data science as quickly as possible."
Allen B. Downey, Think Stats. Exploratory Data Analysis in Python, 2014
Free book. Almost ten years old, but still useful.
Dirk Hovy, Text Analysis in Python for Social Scientists. Prediction and Classification, 2022
A recent book that should help with the specific areas of text mining and text analysis with Python, although I am adding it to this list without having had a chance to take a proper look at its actual contents.
Matheus Facure Alves, Causal Inference for The Brave and True, 2023
A book that covers linear models, panel data, diff-in-diff, regression discontinuity designs, and more, all in Python. A good guide to using the relevant packages that allow to use these methods, i.e.
statsmodels
andlinearmodels
. Not a book for those who want to learn Python, despite the fact that it contains lots of useful code to learn e.g. plots withmatplotlib
andseaborn
.
Jacqueline Kazil and Katharine Jarmul, Data Wrangling with Python, 2016
A book that covers data import, data wrangling, and Web scraping using the
Scrapy
library, as well as APIs. The list of appendices looks very useful.
Sam Lau, Joey Gonzalez, and Deb Nolan, Learning Data Science, f. 2023
Forthcoming book based on the Principles and Techniques of Data Science ('Data 100') course at UC Berkeley. Assumes you know Python already, which you will, if you take the course, and before you do, also take the 'Data 8' course mentioned above.
Andreas C. Müller, Sarah Guido, Introduction to Machine Learning with Python, 2016
All the essentials, in just one book.
Sebastian Raschka and Vahid Mirjalili, Python Machine Learning, 2019
Extensive book. The first author also has a full course for you: Machine Learning (University Wisconsin-Madison, 2018).
Arthur Turrell et al., Python for Data Science, 2022
A Python equivalent of the same book for R. Free to read online, and like its R equivalent, pretty extensive.
Jake VanderPlas, Python Data Science Handbook, 2016
This book covers essential Python data science modules: NumPy, Pandas, Matplotlib, and machine learning with Scikit-learn. The only other module that you will actually need is statsmodels.
Tim Hopper, Python Plotting for Exploratory Data Analysis, 2020
Basic plots, using
matplotlib
or other libraries built on top of it.
Rafe Kettler, A Guide to Python's Magic Methods, 2012
Python has this
__thing__
called magic methods. Check out how they work.
Lev Maximov, Pandas Illustrated: The Definitive Visual Guide to Pandas
Data wrangling with the Pandas package. If you are coming from R, see also Conor MM's R to Python [pandas] useful data wrangling snippets.
Ines Montani, Advanced NLP with spaCy (n.d.)
A full course on text mining and natural language processing with one of the best Python libraries around.
Guillaume Plique et al., minet: Web Mining Library and Command Line Tool Written in Python
A tool made by the médialab at Sciences Po, Paris. Check the Github repository for the full documentation.
Jake VanderPlas, A Whirlwind Tour of Python, 2016
This tutorial will show the Python language essentials. It is intended at people familiar with another language.
Tomas Beuzen, Python Programming for Data Science, 2021
The basics of Python and Pandas, with a few more things on NumPy.
DataCamp, Python Data Science Track, n.d.
All slides and code from a popular online learning platform.
Ethan Swan and Bradley Boehmke, Intro to Python for Data Science Workshop, c. 2022
Very easy to follow. View it as Binder notebooks, or follow the slides.
Stefan McCabe, Programming with Data
Harder to follow, but lists many interesting examples for replication.
Thomas J. Sargent and John Stachurski, Quantitative Economics with Python
Three courses for
the price of onefree.
-
Kim AntunezLino Galiana, Python pour la data science, 2023 - Ewen Gallic, Python pour les économistes, 2018
The following list was sent by a student (thanks Urjasvi) who was looking for Python courses from providers that deliver completion certificates:
- Coursera: Charles Russell Severance, Python for Everybody Specialization
- Coursera: Paul Resnick and Steve Oney, Python Basics
More might be available via e.g. DataCamp and edX.
Based on a few bookmarks dating back to 2017-03-15 (I did not dig into the older ones).