mlb-data-lab
is a Python application and library for creating advanced stat
summary sheets for MLB players. It supports yearly customizations and provides
visualizations. The project can also be imported as a library so you can extend
its functionality for custom applications or data processing workflows. It uses
the pybaseball
and
MLB-StatsAPI
libraries along with
other Python packages to gather and format data for dashboards, reports and
other analytical tools.
The project retrieves data from MLB and FanGraphs to ensure accurate, up‑to‑date statistics. Future releases will continue to expand the application's capabilities so it can serve as both a standalone tool and a reusable library.
Below are samples of the summary sheets that can be generated by this project. The first sample is a Batting Summary for Riley Greene for the 2024 season. The second sample is a Pitching Summary for Tarik Skubal for the 2024 season.
In addition to the baseball stats you would expect, the summary sheets also include the following "advanced" stats:
Batters | Pitchers | ||||
---|---|---|---|---|---|
BB% | UBR | K/9 | Opponent Avg | Swing % | |
K% | wRC | BB/9 | WHIP | Splits | |
OBP | wRAA | K/BB | BABIP | ||
SLG | wOBA | H/9 | LOB% | ||
OPS | wRC+ | HR/9 | ERA- | ||
ISO | WAR | K% | FIP- | ||
Spd | Splits | BB% | FIP | ||
BABIP | K-BB% | RS/9 |
The project is organized as follows:
mlb-data-lab/
├── README.md
├── setup.py
├── requirements.txt
├── mlb_data_lab/ # Source code
│ ├── apis/ # API clients for MLB and FanGraphs
│ ├── data_viz/ # Plotting utilities
│ ├── player/ # Player models and helpers
│ ├── summary_sheets/ # Classes that generate summary sheets
│ ├── team/ # Team utilities
│ └── ...
├── scripts/ # Helper scripts for data collection
└── tests/ # Unit tests
To get started with the project, follow these steps:
- Clone the repository:
git clone https://github.com/timothyf/mlb-data-lab.git
cd mlb-data-lab
- Set up a Python virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate
- Install the required dependencies:
pip install -r requirements.txt
There are several scripts in the scripts
directory for some basic functionality:
python scripts/generate_player_summary.py [options]
Options:
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
Run the project by executing the main script in the scripts
directory:
python scripts/save_statcast_data.py [options]
--players [1 or more player names]
--teams [1 or more team names]
--year [specify a 4-digit year]
python scripts/generate_player_summary.py --players 'Riley Greene'
Output:
output/2024/Tigers/batter_summary_riley_greene.png
python scripts/generate_player_summary.py --teams 'Detroit Tigers' --year 2024
To set up the PostgreSQL database for MLB Data Lab, follow these steps:
-
Install PostgreSQL:
Download and install PostgreSQL from postgresql.org. -
Create the Database: Open your terminal and run:
createdb mlb_data_lab_db
-
Initialize the Schema: Run the provided
setup_db.sql
file to create the tables:psql -d mlb_data_lab_db -f setup_db.sql
-
Verify the Setup: Connect to your database and list the tables:
psql -d mlb_data_lab_db \dt
You should see tables such as
games
,players
,umpires
andplate_appearances
.
This project was inspired by my time working in the R&D department of the Washington Nationals, and the pitching summary project from Thomas Nestico. Here is a link to an article describing his project:
https://medium.com/@thomasjamesnestico/creating-the-perfect-pitching-summary-7b8a981ef0c5
This package and its author are not affiliated with MLB or any MLB team. This API wrapper interfaces with MLB's Stats API. Use of MLB data is subject to the notice posted at http://gdx.mlb.com/components/copyright.txt.
<style> table td.batter-col { background-color: lightblue; color: black; } table td.pitcher-col { background-color: lightgreen; color: black; } </style>