This project orchestrates the end-to-end analytics stack for professional leagues—scraping schedules, enriching with betting lines, transforming the data with dbt, and deploying dashboards. Prefect flows coordinate Databricks jobs, Terraform provisions the Azure infrastructure, and the resulting datasets feed the BI layer.
configs/– shared YAML/JSON settings for schedules, API keys, and alerting.databricks/– jobs and notebooks executed inside the Databricks workspace.dbt/– sources, models, and macros that power the curated warehouse layer.flows/– Prefect deployments (etl_all_league-deployment.yaml, etc.) that fan out across leagues.resources/– ARM templates, design docs, and BI artifacts.terraform/– Infrastructure-as-code for storage, compute, and secrets.
- Install dependencies:
pip install -r requirements.txt - Authenticate:
az loginfor Azure resources.prefect cloud login --key <PREFECT_API_KEY>for orchestration.- Configure Databricks CLI (
databricks configure --token).
- Export environment variables (examples):
$env:LEAGUE_API_KEY = "<key>" $env:PREFECT_PROFILE = "prod"
- Trigger a per-league ETL:
prefect deployment run etl-per-league/production
- Refresh the dbt layer locally:
cd dbt dbt deps && dbt build --profiles-dir profiles
- Apply infrastructure changes:
cd terraform terraform init terraform plan terraform apply
- Prefect deployments (
flows/*.yaml) double as reproducible schedules; update them whenever you change flow code. - Use
dbt testplusdbt docs generateto validate models before promoting. - Terraform changes require code review—run
terraform fmtandterraform validateprior to pull requests.