nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout.
Includes import functions for play-by-play data, weekly data, seasonal data, rosters, win totals, scoring lines, officials, draft picks, draft pick values, schedules, team descriptive info, combine results and id mappings across various sites.
Use the package manager pip to install nfl_data_py.
pip install nfl_data_py
import nfl_data_py as nfl
Working with play-by-play data
nfl.import_pbp_data(years, columns, downcast=True, cache=False, alt_path=None)
Returns play-by-play data for the years and columns specified
years : required, list of years to pull data for (earliest available is 1999)
columns : optional, list of columns to pull data for
downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%
cache : optional, determines whether to pull pbp data from github repo or local cache generated by nfl.cache_pbp()
alt_path : optional, required if nfl.cache_pbp() is called using an alternate path to the default cache
nfl.see_pbp_cols()
returns list of columns available in play-by-play dataset
Working with weekly data
nfl.import_weekly_data(years, columns, downcast)
Returns weekly data for the years and columns specified
years : required, list of years to pull data for (earliest available is 1999)
columns : optional, list of columns to pull data for
downcast : converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%
nfl.see_weekly_cols()
returns list of columns available in weekly dataset
Working with seasonal data
nfl.import_seasonal_data(years, s_type)
Returns seasonal data, including various calculated market share stats specific to receivers
years (List[int]) : required, list of years to pull data for (earliest available is 1999)
s_type (str) : optional (default 'REG') season type to include in average ('ALL','REG','POST')
calculated receiving market share stats include:
Column | is short for |
---|---|
tgt_sh | target share |
ay_sh | air yards share |
yac_sh | yards after catch share |
wopr | weighted opportunity rating |
ry_sh | receiving yards share |
rtd_sh | receiving TDs share |
rfd_sh | receiving 1st Downs share |
rtdfd_sh | receiving TDs + 1st Downs share |
dom | dominator rating |
w8dom | dominator rating, but weighted in favor of receiving yards over TDs |
yptmpa | receiving yards per team pass attempt |
ppr_sh | PPR fantasy points share |
Additional data imports
nfl.import_seasonal_rosters(years, columns)
Returns yearly roster information for the seasons specified
years : required, list of years to pull data for (earliest available is 1999)
columns : optional, list of columns to pull data for
nfl.import_weekly_rosters(years, columns)
Returns per-game roster information for the seasons specified
years : required, list of years to pull data for (earliest available is 1999)
columns : optional, list of columns to pull data for
nfl.import_win_totals(years)
Returns win total lines for years specified
years : optional, list of years to pull
nfl.import_sc_lines(years)
Returns scoring lines for years specified
years : optional, list of years to pull
nfl.import_officials(years)
Returns official information by game for the years specified
years : optional, list of years to pull
nfl.import_draft_picks(years)
Returns list of draft picks for the years specified
years : optional, list of years to pull
nfl.import_draft_values()
Returns relative values by generic draft pick according to various popular valuation methods
nfl.import_team_desc()
Returns dataframe with color/logo/etc information for all NFL team
nfl.import_schedules(years)
Returns dataframe with schedule information for years specified
years : required, list of years to pull data for (earliest available is 1999)
nfl.import_combine_data(years, positions)
Returns dataframe with combine results for years and positions specified
years : optional, list or range of years to pull data from
positions : optional, list of positions to be pulled (standard format - WR/QB/RB/etc.)
nfl.import_ids(columns, ids)
Returns dataframe with mapped ids for all players across most major NFL and fantasy football data platforms
columns : optional, list of columns to return
ids : optional, list of ids to return
nfl.import_ngs_data(stat_type, years)
Returns dataframe with specified NGS data
stat_type (str) : required, type of stats to pull (passing, rushing, receiving)
years : optional, list of years to return data for
nfl.import_depth_charts(years)
Returns dataframe with depth chart data
years : optional, list of years to return data for
nfl.import_injuries(years)
Returns dataframe of injury reports
years : optional, list of years to return data for
nfl.import_qbr(years, level, frequency)
Returns dataframe with QBR history
years : optional, years to return data for
level : optional, competition level to return data for, nfl or college, default nfl
frequency : optional, frequency to return data for, weekly or season, default season
nfl.import_seasonal_pfr(s_type, years)
Returns a dataframe of season-aggregated data sourced from players' pages on pro-football-reference.com. E.g. Patrick Mahomes
s_type (str) : required, the type of stat data to request. Must be one of pass, rec, or rush.
years (List[int]) : optional, years to return data for
nfl.import_weekly_pfr(s_type, years)
Returns a dataframe of per-game data sourced from players' advanced gamelog pages on pro-football-reference.com. E.g. Mahomes in 2022
s_type (str) : required, the type of stat data to request. Must be one of pass, rec, or rush.
years (List[int]) : optional, years to return data for
nfl.import_snap_counts(years)
Returns dataframe with snap count records
years : optional, list of years to return data for
nfl.import_ftn_data(years, columns=None, downcast=True, thread_requests=False)
Returns dataframe with FTN charting data
FTN Data manually charts plays and has graciously provided a subset of their charting data to be published via the nflverse. Data is available from the 2022 season onwards and is charted within 48 hours following each game. This data is released under the CC-BY-SA 4.0 Creative Commons license and attribution must be made to FTN Data via nflverse
years (List[int]) : required, years to get weekly data for columns (List[str]) : optional, only return these columns downcast (bool) : optional, convert float64 to float32, default True thread_requests (bool) : optional use thread pool to read files, default False
Additional features
nfl.cache_pbp(years, downcast=True, alt_path=None)
Caches play-by-play data locally to speed up download time. If years specified have already been cached they will be overwritten, so if using in-season must cache 1x per week to catch most recent data
years : required, list or range of years to cache
downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%
alt_path :optional, alternate path to store pbp cache - default is in program created user Local folder
nfl.clean_nfl_data(df)
Runs descriptive data (team name, player name, etc.) through various cleaning processes
df : required, dataframe to be cleaned
I'd like to recognize all of Ben Baldwin, Sebastian Carl, and Lee Sharpe for making this data freely available and easy to access. I'd also like to thank Tan Ho, who has been an invaluable resource as I've worked through this project, and Josh Kazan for the resources and assistance he's provided.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.