This repository has material that supplements what is posted on the Babson FIN 6200 Canvas page. It exists mainly to provide publicly accessible URLs for shared data files (in /data/) and template notebook files (in /templates/), a course schedule, and links to external resources.
Python notebooks can run in the cloud using Google Colab or Binder, but you will probably want a local installation. I strongly recommend using the Anaconda Python distribution.
Anaconda includes (almost) everything you need to get going, but in line with these recommendations, I prefer to work in Visual Studio Code with some add-in extensions.
- Microsoft Jupyter Notebooks in VS Code (and video)
- Microsoft's Data Wrangler for VS Code
These are the critical packages we will rely on; if you need a package not included with Anaconda, you should first try to install it using conda install and only if that doesn’t work, install using pip
- pandas (data analysis)
-
- Pandas data reader (access to various data sources including FRED)
- yfinance (access to Yahoo Finance data)
- Seaborn (data visualization) and an overview of Python visualization tools
- NumPy and SciPy (scientific computing)
- statsmodels (statistical estimation and testing)
- WRDS (access to the authoritative source of historical financial data)
Additional packages that may be useful include YData Profiling (automated EDA), Pyjanitor (data cleaning), and dataprep (data cleaning and automated EDA)
- Think Python (3rd ed., and repo), Allen B. Downey
- Whirlwind Tour of Python (and repo), Jake VanderPlas
- Python Data Science Handbook (and repo), Jake VanderPlas
- Python for Data Analysis (3rd ed.), Wes McKinney
- Coding for Economists, Arthur Turrell
- Introduction to Python for Econometrics, Statistics and Data Analysis (5th ed.), Kevin Sheppard
- Datacamp cheatsheets: Python basics, pandas basics, pandas advanced, Seaborn, NumPy, and importing data
- Learn Python in Y Minutes
- pandas User Guide
- Kaggle Welcome to Data Visualization (uses Python, Pandas, and Seaborn)
- How to read most commonly used file formats in Data Science (using Python)
- Stata to Python equivalents, Daniel M. Sullivan
Introduction to Modern Statistics (2nd ed., and repo, data repo), Mine Çetinkaya-Rundel and Johanna Hardin
- Bloomberg Anywhere
- Open Source Asset Pricing
- Wharton Research Data Services (WRDS)
- Datasets by concept and by product
- CRSP Methodologies
Data Analytics Using Microsoft Excel With Accounting and Finance Datasets (v3.0), Joseph M. Manzo
Relevant resources TBA
Course designed with significant advice/help/inspiration from Don Bowen, Michael Goldstein, Grant McDermott, Cameron Pfiffer, Seth Pruitt, and Arthur Turrell