epl_statistics

Project for the DataTalksClub/Data Engineering Zoomcamp

Overview

This project is part of the Data Engineering Zoomcamp, a course organization by DataTalksClub .The goal of this project is to apply everything we have learned in this course to build an end-to-end data pipeline.

Problem

This project uses data from English Premier League matches over the past 10 seasons (2014/2015-2023/2024). This data is taken from the website https://www.football-data.co.uk/ . The goal of this project is to create a dashboard to search for potential football teams (so bettors can confidently bet on their favorite team), helping to minimize risks before placing a bet money goes to bookie. Because each season there will be players transferred, it is impossible to know their mutations, so all data are for reference only. Think carefully before placing a bet.

Disclaimed
- Betting in some countries is illegal and can result in criminal prosecution. I don't support online betting. Any attempt to claim compensation or blame for reliance on this dashboard is unacceptable.

Dataflow diagram

Stack

Container: Docker
Iac: Terraform
Cloud: Google Cloud Platform (GCP)
Orchestration: Airflow
Data Lake: Google Cloud Storage (GCS)
Data Warehouse: BigQuery
Transformation: Data build tool (dbt)
Visualization: Looker

Tutorial

Prerequisites

Installed locally:
- Terraform
- Python 3
- Docker & docker-compose
A project in Google Cloud Platform

Setup

To run this project, you need to clone this repository:
```
git clone https://github.com/truongvude/epl_statistics
```

Terraform

Setup GCP for the first time.
Move to terraform folder. Update variables credentials, gcs_bucket_name, bq_dataset_name in variables.tf file to your desired.
Run this command to execute terraform

# Login to Gcloud CLI
gcloud auth application-default login
# Initialize state file (.tfstate)
terraform init
# Check changes to new infra plan
terraform plan

# Create new infra
terraform apply

Airflow + Bigquery
- Setup Airflow with Docker
- Change GCP_PROJECT_ID & GCP_GCS_BUCKET in docker-compose.yaml, BIGQUERY_DATASET in data_ingestion_gcs_dag.py as your config.
- Run this command
```
# Move to airflow folder
cd airflow
# Build the image (only first-time, or when there's any change in the Dockerfile, takes ~15 mins for the first-time):
docker compose build
# Initialize the Airflow scheduler, DB, and other config
docker compose up airflow-init
# Kick up the all the services from the container:
docker compose up
```
- Login to Airflow web UI on localhost:8080 with default creds (username/password): airflow/airflow
- Run DAG on the Web Console. On finishing your run or to shut down the container/s:
```
docker compose down
```
- Check your external table in BigQuery.
Dbt
- Setup your dbt account and project.
- Go to Develop -> Cloud IDE.
- Copy code from this folder.
- Run dbt build to execute.
- Check your dataset in BigQuery.
Looker Studio

In this step you need to connect to connect table in BigQuery with your Looker Studio
- Go to Looker Studio: https://lookerstudio.google.com/.
- Create a blank report -> Select BigQuery in Google Connector. Select your project, dataset and table.
- Create your dashboard.

Dashboard

Link dashboard: EPL Statistics

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
airflow		airflow
dbt		dbt
images		images
terraform		terraform
.gitignore		.gitignore
EPL_Statistics.pdf		EPL_Statistics.pdf
README.md		README.md
note.txt		note.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

epl_statistics

Overview

Problem

Disclaimed

Dataflow diagram

Stack

Tutorial

Prerequisites

Setup

Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Languages

truongvude/epl_statistics

Folders and files

Latest commit

History

Repository files navigation

epl_statistics

Overview

Problem

Disclaimed

Dataflow diagram

Stack

Tutorial

Prerequisites

Setup

Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages