-
Docker Compose Containers
This shows Docker running our containers for the weather pipeline, Prometheus, and Grafana. -
Grafana Dashboard
Displays the pipeline metrics (data volumes, durations, API performance) once properly configured. -
Airflow Login Page
Airflow prompts for username and password to access its web UI. -
Airflow DAGs Interface
A list of DAGs (pipelines) that can be scheduled and monitored via Airflow.
The OpenWeatherMap Data Pipeline Engineering Project is a comprehensive data engineering solution to collect, process, and analyze weather data from the OpenWeatherMap API. It demonstrates a complete ETL pipeline with integrated monitoring, visualization, and multiple deployment options.
flowchart LR
User([User])
API[OpenWeatherMap API]
ETL[ETL Pipeline]
DB[(Storage)]
Monitor[Monitoring]
Insight[Analytics]
User --> API
API --> ETL
ETL --> DB
DB --> Insight
ETL <--> Monitor
Insight --> User
style API fill:#93c5fd,stroke:#2563eb,stroke-width:2px
style ETL fill:#fde68a,stroke:#d97706,stroke-width:2px
style Monitor fill:#d1fae5,stroke:#059669,stroke-width:2px
style Insight fill:#fbcfe8,stroke:#db2777,stroke-width:2px
- Key Features
- Technology Stack
- Project Structure
- Installing & Running (The Story)
- Processing Pipeline
- Data Analysis
- Deployment Options
- Monitoring
- References
- License
-
Automated Weather Data Collection
- Multi-city weather data extraction
- Configurable sampling frequency
- Resilient retry logic for API calls
-
Robust Data Processing
- Data cleaning, outlier handling
- Derived metric computation
-
Comprehensive Analytics
- City-to-city comparisons
- Temperature trend analysis
- Weather pattern visualizations
-
Enterprise-Grade Infrastructure
- Docker containerization
- Kubernetes orchestration
- Optional Airflow scheduling
- Python 3.12+
- Docker / Kubernetes
- Prometheus / Grafana
- Apache Airflow (for advanced scheduling)
flowchart TD
Python[Python 3.12] --> Pandas[Pandas] & Matplotlib[Matplotlib]
Python --> Docker[Docker]
Docker --> K8s[Kubernetes]
Python --> Airflow[Airflow]
Python --> Prometheus[Prometheus]
Prometheus --> Grafana[Grafana]
weather_data_pipeline/
├── README.md
├── images/
│ ├── Grafana.png
│ ├── apache_airflow_interface.png
│ ├── apache_airflow_login.png
│ └── docker-compose up.png
├── config/
│ └── config.yaml
├── data/
│ ├── raw/
│ ├── processed/
│ └── output/
├── logs/
├── requirements.txt
├── src/
│ ├── extract.py
│ ├── transform.py
│ ├── load.py
│ ├── analyze.py
│ └── utils.py
├── main.py
├── Dockerfile
├── docker-compose.yml
├── airflow/
│ └── weather_pipeline_dag.py
├── kubernetes/
│ └── deployment.yaml
└── monitoring/
├── prometheus.yml
└── grafana-dashboard.json
Below is a step-by-step flow illustrating how to install Docker, confirm everything is up and running, then transition to Kubernetes and Airflow.
- On macOS, ensure Docker is not stuck:
sudo launchctl remove com.docker.vmnetd
- Verify Docker commands:
docker pull hello-world docker run hello-world docker ps docker ps -a docker --version
- Check existing images & remove any (optional):
docker images docker rmi <IMAGE_ID>
- Open Docker Desktop GUI (macOS):
open -a Docker
-
Build the Docker image:
docker build -t weather-pipeline .
-
Run the container with your API key:
docker run --env-file .env weather-pipeline
-
Spin up services via Docker Compose:
docker compose up
If you get a port conflict (e.g., for port 9090), try:
lsof -i :9090 kill -9 <PID> pkill -f prometheus docker compose up
-
Check Docker containers:
docker ps
You should see 3 containers:
- Weather Pipeline (port 8000)
- Prometheus (port 9090)
- Grafana (port 3000)
Once Docker is up, open Grafana at http://localhost:3000.
- Username:
admin
- Password:
admin
(by default, if unchanged)
You’ll see panels for:
- Pipeline Duration
- Data Volumes (Records Processed & Data Points Extracted)
- API Performance
If you see “No data,” check your
prometheus.yml
or the pipeline’s main logs to ensure metrics are being scraped properly.
- Install & Initialize Airflow:
pip install apache-airflow airflow db init
- Create Admin User:
airflow users create \ --username admin \ --password admin \ --firstname Admin \ --lastname User \ --role Admin \ --email [email protected]
- Add the DAG:
mkdir -p ~/airflow/dags cp airflow/weather_pipeline_dag.py ~/airflow/dags/
- Start Airflow:
airflow webserver --port 8080 airflow scheduler
- Screenshot (DAGs):
You should now see a list of DAGs (pipelines). Enable or trigger the relevant ones.
- Start Minikube:
minikube start
- Apply the Weather Pipeline Deployment:
kubectl apply -f kubernetes/deployment.yaml
- (Optional) Create a secret for your API key:
kubectl create secret generic weather-pipeline-secrets \ --from-literal=API_KEY=your_openweathermap_api_key
- Check pods:
kubectl get pods
Now your pipeline can run in a Kubernetes environment!
graph TD
A[Input Config] --> B[Data Extraction]
B --> C[Data Transformation]
C --> D[Data Loading]
C --> E[Data Analysis]
E --> F[Visualization]
F --> G[Results]
- Extract: Grab weather data from OpenWeatherMap
- Transform: Clean, normalize, handle outliers
- Load: Store processed data in local files or DB
- Analyze: Generate city comparisons, identify trends
- Visualize: Plot charts & graphs (Matplotlib, etc.)
mindmap
root((Weather Analysis))
City Comparisons
Temperature
Humidity
Wind Speed
Temporal Analysis
Daily Variation
Long-term Trend
Weather Conditions
Condition Distribution
Alerts
Correlation
Temperature-Humidity
Wind-Temperature
The pipeline can generate:
- Time-series plots (temperature trends)
- Comparison charts across multiple cities
- Correlation analyses (humidity vs. temperature)
- Local Docker:
docker-compose up --build
- Kubernetes (Minikube):
minikube start && kubectl apply -f deployment.yaml
- Airflow: Local scheduler and UI (port
8080
) - EC2: GitHub Actions CI/CD for continuous deployment
- Prometheus collects pipeline metrics (port
9090
). - Grafana visualizes metrics (port
3000
).- Import
monitoring/grafana-dashboard.json
for a pre-built dashboard.
- Import
- Alerts can be configured in Prometheus/Grafana to notify on pipeline failures or anomalies.
- OpenWeatherMap API Docs
- Docker Documentation
- Kubernetes Docs
- Prometheus Docs
- Grafana Docs
- Apache Airflow Docs
© 2025 Fahmi Zainal
All rights reserved. This project and its contents are proprietary and
confidential. Unauthorized copying, distribution, or modification of
this software, via any medium, is strictly prohibited. For licensing
inquiries, please contact the project maintainer.