This repository contains the pipeline for processing ECMWF (European Centre for Medium-Range Weather Forecasts) tropical cyclone and wind forecast data. The pipeline downloads, extracts, transforms, and loads hurricane forecast data into Snowflake for use by downstream applications.
-
Ahead-of-the-Storm: Dash web application for visualizing hurricane impact forecasts. The application displays interactive maps, probabilistic analysis, and impact reports based on pre-processed hurricane data
-
Ahead-of-the-Storm-DATAPIPELINE: Data processing pipeline for creating bounding boxes, initializing base data, and processing storm impact files that are read by the Ahead-of-the-Storm application
The pipeline processes ECMWF ensemble tropical cyclone forecasts through the following steps:
- Download TC Data: Downloads tropical cyclone track BUFR files from ECMWF's Dissemination (DISS) system
- Extract TC Data: Extracts structured data from BUFR files using eccodes library
- Transform TC Data: Converts raw data to standardized format with wind radii, polygons, and metadata
- Download Wind Data: Downloads ensemble wind forecast GRIB files matching TC forecast run times
- Process Wind Combination: Creates wind threshold envelope polygons by combining TC tracks with wind forecast data
- Load to Snowflake: Loads processed data into Snowflake for querying and visualization
The pipeline produces three types of data:
- TC Track Data (
*_transformed.csv): Individual forecast points with storm positions, wind speeds, pressure, wind radii, and wind field polygons - Individual Wind Envelopes (
*_envelopes_individual.csv): Wind threshold polygons for each forecast step and ensemble member - Combined Wind Envelopes (
*_envelopes_combined.csv): Combined wind threshold polygons across all forecast steps per member
-
Python 3.11+ installed
-
Virtual environment activated (
.venv) -
Environment variables configured
-
Start from the provided example:
cp sample_env.txt .env -
Edit values to match your environment (Snowflake credentials)
-
-
eccodes library installed (required for BUFR file processing)
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txtSNOWFLAKE_ACCOUNTSNOWFLAKE_USERSNOWFLAKE_PASSWORDSNOWFLAKE_WAREHOUSESNOWFLAKE_DATABASESNOWFLAKE_SCHEMA
DOWNLOAD_DATE- Specific date to download (YYYYMMDD format, e.g., "20251015")RUN_TIME- Specific run time filter (00, 06, 12, or 18)DOWNLOAD_LIMIT- Number of latest forecasts to download (default: 1)PROCESS_WIND_DATA- Enable wind processing (default: true)CLEANUP_AFTER_LOAD- Clean up temporary files after load (default: true)SKIP_EXISTING- Skip already processed files (default: false in CI)
The pipeline consists of several Python modules:
ecmwf_tc_data_downloader.py: Downloads tropical cyclone BUFR files from ECMWF DISS systemecmwf_tc_data_extractor.py: Extracts structured data from BUFR files (Template 316082)ecmwf_tc_data_transformer.py: Transforms raw data to standardized CSV format with wind radii and polygonsecmwf_wind_data_downloader.py: Downloads ensemble wind forecast GRIB files from ECMWF Open Dataecmwf_wind_data_extractor.py: Extracts wind threshold polygons from GRIB filesecmwf_tc_wind_combination.py: Combines TC tracks with wind forecasts to create envelope polygons
github_actions/main.py: Automated pipeline orchestrator for GitHub Actionsgithub_actions/snowflake_loader.py: Snowflake database loader with staging table logicweb/app.py: Dash web dashboard for visualizing forecast data from Snowflakevisualization.py: Visualization utilities for TC tracks and wind envelopes
You can run individual pipeline steps interactively or use the Jupyter notebook for exploration:
# Interactive exploration
jupyter notebook pipeline_demonstration.ipynbOr run individual steps:
from ecmwf_tc_data_downloader import download_tc_data
from ecmwf_tc_data_extractor import extract_tc_data_from_file
from ecmwf_tc_data_transformer import transform_tc_data_from_file
# Step 1: Download TC data
download_tc_data(date="20251015", run_time="12", output_dir="tc_data")
# Step 2: Extract BUFR files
# (extract from downloaded .bin files)
# Step 3: Transform data
# (transform extracted CSV files)The pipeline runs automatically on a schedule:
Pipeline Schedule
ECMWF issues new forecasts at 00, 06, 12, and 18 UTC, but the data is typically not published until around 07:41, 11:40, 19:41, and 23:40 UTC.
To align with these publication times, the pipeline is scheduled to run at:
- 09:00 UTC (after 00Z forecast published)
- 13:00 UTC (after 06Z forecast published)
- 21:00 UTC (after 12Z forecast published)
- 01:00 UTC (after 18Z forecast published)
This ensures the forecasts are available before the pipeline starts.
Manual Trigger
You can manually trigger the workflow with parameters:
download_date(optional): Specific date in YYYYMMDD formatrun_time(optional): Forecast run time (00, 06, 12, or 18)cleanup(optional): Clean up temporary files after load (default: true)
Setup for GitHub Actions
-
Configure GitHub Secrets in your repository settings:
SNOWFLAKE_ACCOUNTSNOWFLAKE_USERSNOWFLAKE_PASSWORDSNOWFLAKE_WAREHOUSESNOWFLAKE_DATABASESNOWFLAKE_SCHEMA
-
The workflow will automatically:
- Download latest TC forecast data
- Extract and transform data
- Download matching wind forecast data
- Process wind envelopes
- Load all data to Snowflake
The web dashboard provides interactive visualization of forecast data stored in Snowflake:
# Set Snowflake environment variables
export SNOWFLAKE_ACCOUNT="your_account"
export SNOWFLAKE_USER="your_user"
export SNOWFLAKE_PASSWORD="your_password"
export SNOWFLAKE_WAREHOUSE="your_warehouse"
export SNOWFLAKE_DATABASE="your_database"
export SNOWFLAKE_SCHEMA="your_schema"
# Run the dashboard
python web/app.pyThe dashboard will be available at http://localhost:10000
ECMWF DISS System (BUFR files)
↓
[Download TC Data]
↓
BUFR Files (.bin)
↓
[Extract TC Data]
↓
Raw CSV Files
↓
[Transform TC Data]
↓
Transformed CSV Files (*_transformed.csv)
↓
ECMWF Open Data (GRIB files)
↓
[Download Wind Data]
↓
Wind GRIB Files (.grib2)
↓
[Process Wind Combination]
↓
Envelope CSV Files (*_envelopes_*.csv)
↓
[Load to Snowflake]
↓
Snowflake Tables (TC_TRACKS, TC_ENVELOPES_INDIVIDUAL, TC_ENVELOPES_COMBINED)
- Install eccodes library (see Prerequisites section)
- On macOS:
brew install eccodes
- Check ECMWF data availability for the specified date/time
- Verify network connectivity to
https://essential.ecmwf.int/ - Ensure forecast has been published (check publication times in schedule)
- Verify all
SNOWFLAKE_*environment variables are set correctly - Check network connectivity to Snowflake
- Ensure Snowflake credentials have proper permissions
- Verify warehouse is running
- Ensure wind data download completed successfully
- Check that wind forecast run time matches TC forecast run time
- Verify forecast hours are available (wind data downloads every 6 hours from 0-144h)
- Ensure you're in the virtual environment:
source .venv/bin/activate - Reinstall dependencies:
pip install -r requirements.txt - Check that all core modules are in the repository root
- Verify BUFR files were extracted correctly
- Check that extracted CSV files contain valid data
- Review error logs for specific column or data type issues
- Verify all GitHub Secrets are set correctly
- Check GitHub Actions logs for specific error messages
- Ensure workflow file (
ecmwf-tc-pipline.yml) exists and is configured correctly
- Verify Snowflake connection credentials
- Check that data has been loaded by the pipeline
- Verify database/schema names match
- Query Snowflake directly to confirm data exists
- Raw TC data:
tc_data/(BUFR files and extracted CSVs) - Transformed TC data:
tc_data_transformed/(orTRANSFORMED_DATA_DIRenv var) - Wind data:
wind_data/(GRIB files) - Wind envelopes:
wind_extracted/(envelope CSV files)
TC_TRACKS: Individual forecast points with storm positions, wind speeds, pressure, and wind radiiTC_ENVELOPES_INDIVIDUAL: Wind threshold polygons per forecast stepTC_ENVELOPES_COMBINED: Combined wind threshold polygons across all forecast steps
Temporary files are cleaned up after successful load (unless CLEANUP_AFTER_LOAD=false).
- BUFR Processing: Uses eccodes library to parse ECMWF BUFR Template 316082
- GRIB Processing: Uses xarray and cfgrib to process ensemble wind forecast data
- Geospatial Processing: Uses Shapely and GeoPandas for polygon operations
- Data Transformation: Pandas for data manipulation and standardization
- Local: Python scripts can be run individually or via Jupyter notebook
- Automated: GitHub Actions workflow orchestrates all steps
- Containerized: Docker support for consistent execution environments
- Input: ECMWF DISS system (BUFR) and ECMWF Open Data (GRIB)
- Processing: Local filesystem (temporary)
- Output: Snowflake data warehouse
- Visualization: Dash web application reading from Snowflake
# 1. Set up environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Configure environment variables
cp sample_env.txt .env
# Edit .env with your Snowflake credentials
# 3. Test locally (optional)
jupyter notebook pipeline_demonstration.ipynb
# 4. For automated pipeline: Configure GitHub Secrets and enable workflow
# The pipeline will run automatically on schedule