Earthquake Hazards Dashboard

This project implements a data pipeline using the USGS Earthquake API to ingest, process, and visualize global earthquake data in a dashboard.

The dashboard aids to visualize where the earthquakes occurs around the globe, countries and continents with the most severe earthquakes, and how the earthquakes magnitudes changes over time.

Dashboard Components

dashboard-preview.mp4

The dashboard consists of four primary visualization components:

Pie Charts: Visualizes the distribution of earthquakes by magnitude category, country, and continent.
Earthquake Map: Displays the geographical locations where earthquakes occurred.
Time Series Plot: Shows the average earthquake magnitude by continent over time.
- NOTE: Since the data for this plots is aggregated by continent and country, the average magnitude for the $i$-th continent is defined as the average of the average of the earthquake magnitude weighted by the number of earthquakes: $$\frac{\sum_{j=1}^{n_i} \bar x_{ij} m_{ij}}{\sum_{j=1}^{n_i} m_{ij}},$$ where $n_{i}$ is the number of countries, $\bar x_{ij}$ is the average magnitude for the $j$-th country in the $i$-th continent and $m_{ij}$ is the number of earthquakes.
Bar Plot: Average earthquakes magnitudes by country.

Data Sources

Earthquake Data: Obtained from the USGS Earthquake Hazards API.
Geolocation Data: Countries and continents are assigned using reverse geolocation with natural earth shapefiles.

Observations

Due to the data source, there is a notable concentration of recorded earthquakes in the United States and nearby regions.
USGS is capable of detecting very small earthquakes in the USA region, which distort the magnitude downwards for the country USA and continent North America, which ends up differing greatly from other regions.

Table Structure

The schema for the earthquake data is defined in earthquakes_schema.json. The table is partitioned daily and clustered by earthquake_id, continent and country, as specified in main.tf.

The daily partition on the earthquake events helps building the incremental table for the time series plots. The cluster on the earthquake_id improves query performance to insert new rows. Finally, clustering by country and continent improves query performance for aggregation queries used for the dashboard.

Data Pipeline

The pipeline runs daily on a Google Cloud Compute instance with Fedora CoreOS. The instance starts at 00:00 UTC and shuts down at 01:00 UTC. While active, a systemd service defined in cloud-startup starts the required containers and workflows.

The workflows, implemented as Apache Airflow DAGs, are located in the src/dags directory. The main DAGs are:

get_earthquake_data.py (ELT - Extract, Load, Transform):
- Fetches data from the USGS API.
- Stores the geojson raw data in a Google Cloud Storage data lake.
- Processes and loads cleaned data into BigQuery.
generate_summary_tables.py (Transform & Aggregate):
- Uses dbt to generate precomputed summary statistics for dashboard visualization.
- The transformation logic is implemented in earthquake_analysis.

Local Environment Setup

Cloud Infrastructure Setup

The cloud environment is provisioned using Terraform. Start by creating an .env file from the example:

cp .env.example .env

Then, initialize and apply the Terraform configuration:

terraform init
terraform plan
terraform apply

Next, update the .env file with the generated cloud information. Retrieve the Airflow service account key with:

terraform output -raw airflow_gcs_key | base64 -d > /path/to/your/private/key.json

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable in .env to the key file path.

Compute Instance Configuration

Follow the instructions in cloud-setup.md to configure the VM.

Running the Airflow Service

The project includes two docker-compose files:

docker-compose.yaml: Used for the compute engine.
docker-compose-dev.yaml: Adds local secrets for development.

To start Airflow in production:

docker compose up

To start Airflow in development:

docker compose -f docker-compose.yaml -f docker-compose-dev.yaml up

Ensure environment variables are correctly configured and credentials are provided in both environments.

Apache Superset Setup

Clone the Superset repository:

git submodule update --init --recursive

Start Superset:

docker compose -f ./superset/docker-compose-image-tag.yml up

To create the dashboard:

Connect Superset to BigQuery.
Add the dataset to Superset.
Enable maps by setting your MAPBOX_API_KEY.

Tools & Technologies

Apache Airflow: Workflow orchestration.
Apache Superset: Business Intelligence & data visualization.
Docker: Containerization.
Fedora CoreOS: Cloud-optimized OS.
Google Cloud Platform: Cloud services.
Mapbox: Geospatial visualization.
Terraform: Infrastructure as code.
USGS Earthquake API: Earthquake data source.
dbt: Data transformation.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
bigquery		bigquery
cloud-startup		cloud-startup
docs		docs
src		src
superset @ 6264ff5		superset @ 6264ff5
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
.sqlfluff		.sqlfluff
.terraform.lock.hcl		.terraform.lock.hcl
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose.yaml		docker-compose.yaml
main.tf		main.tf
outputs.tf		outputs.tf
pyproject.toml		pyproject.toml
uv.lock		uv.lock
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Earthquake Hazards Dashboard

Dashboard Components

Data Sources

Observations

Table Structure

Data Pipeline

Local Environment Setup

Cloud Infrastructure Setup

Compute Instance Configuration

Running the Airflow Service

Apache Superset Setup

Tools & Technologies

About

Uh oh!

Releases

Packages

Languages

License

Alvaro-Kothe/earthquake-hazzard

Folders and files

Latest commit

History

Repository files navigation

Earthquake Hazards Dashboard

Dashboard Components

Data Sources

Observations

Table Structure

Data Pipeline

Local Environment Setup

Cloud Infrastructure Setup

Compute Instance Configuration

Running the Airflow Service

Apache Superset Setup

Tools & Technologies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages