Skip to content

The OpenWeatherMap Data Pipeline Engineering Project collects real-time weather data from multiple cities, processes it through a robust ETL pipeline, and generates insightful visualizations to reveal patterns and trends in temperature, humidity, and wind conditions

Notifications You must be signed in to change notification settings

fahmizainal17/OpenWeatherMap_Data_Pipeline_Engineering_Project

Repository files navigation

🌤️ OpenWeatherMap Data Pipeline Engineering Project

1. Screenshots

  1. Docker Compose Containers
    Docker Compose Up
    This shows Docker running our containers for the weather pipeline, Prometheus, and Grafana.

  2. Grafana Dashboard
    Grafana Dashboard
    Displays the pipeline metrics (data volumes, durations, API performance) once properly configured.

  3. Airflow Login Page
    Airflow Login
    Airflow prompts for username and password to access its web UI.

  4. Airflow DAGs Interface
    Airflow DAGs
    A list of DAGs (pipelines) that can be scheduled and monitored via Airflow.


2. Overview

The OpenWeatherMap Data Pipeline Engineering Project is a comprehensive data engineering solution to collect, process, and analyze weather data from the OpenWeatherMap API. It demonstrates a complete ETL pipeline with integrated monitoring, visualization, and multiple deployment options.

flowchart LR
    User([User])
    API[OpenWeatherMap API]
    ETL[ETL Pipeline]
    DB[(Storage)]
    Monitor[Monitoring]
    Insight[Analytics]
    
    User --> API
    API --> ETL
    ETL --> DB
    DB --> Insight
    ETL <--> Monitor
    Insight --> User
    
    style API fill:#93c5fd,stroke:#2563eb,stroke-width:2px
    style ETL fill:#fde68a,stroke:#d97706,stroke-width:2px
    style Monitor fill:#d1fae5,stroke:#059669,stroke-width:2px
    style Insight fill:#fbcfe8,stroke:#db2777,stroke-width:2px
Loading

3. Table of Contents

  1. Key Features
  2. Technology Stack
  3. Project Structure
  4. Installing & Running (The Story)
  5. Processing Pipeline
  6. Data Analysis
  7. Deployment Options
  8. Monitoring
  9. References
  10. License

4. Key Features

  • Automated Weather Data Collection

    • Multi-city weather data extraction
    • Configurable sampling frequency
    • Resilient retry logic for API calls
  • Robust Data Processing

    • Data cleaning, outlier handling
    • Derived metric computation
  • Comprehensive Analytics

    • City-to-city comparisons
    • Temperature trend analysis
    • Weather pattern visualizations
  • Enterprise-Grade Infrastructure

    • Docker containerization
    • Kubernetes orchestration
    • Optional Airflow scheduling

5. Technology Stack

  • Python 3.12+
  • Docker / Kubernetes
  • Prometheus / Grafana
  • Apache Airflow (for advanced scheduling)
flowchart TD
    Python[Python 3.12] --> Pandas[Pandas] & Matplotlib[Matplotlib]
    Python --> Docker[Docker]
    Docker --> K8s[Kubernetes]
    Python --> Airflow[Airflow]
    Python --> Prometheus[Prometheus]
    Prometheus --> Grafana[Grafana]
Loading

6. Project Structure

weather_data_pipeline/
├── README.md
├── images/
│   ├── Grafana.png
│   ├── apache_airflow_interface.png
│   ├── apache_airflow_login.png
│   └── docker-compose up.png
├── config/
│   └── config.yaml
├── data/
│   ├── raw/
│   ├── processed/
│   └── output/
├── logs/
├── requirements.txt
├── src/
│   ├── extract.py
│   ├── transform.py
│   ├── load.py
│   ├── analyze.py
│   └── utils.py
├── main.py
├── Dockerfile
├── docker-compose.yml
├── airflow/
│   └── weather_pipeline_dag.py
├── kubernetes/
│   └── deployment.yaml
└── monitoring/
    ├── prometheus.yml
    └── grafana-dashboard.json

7. Installing & Running (The Story)

Below is a step-by-step flow illustrating how to install Docker, confirm everything is up and running, then transition to Kubernetes and Airflow.

Step 1: Install & Verify Docker

  1. On macOS, ensure Docker is not stuck:
    sudo launchctl remove com.docker.vmnetd
  2. Verify Docker commands:
    docker pull hello-world
    docker run hello-world
    docker ps
    docker ps -a
    docker --version
  3. Check existing images & remove any (optional):
    docker images
    docker rmi <IMAGE_ID>
  4. Open Docker Desktop GUI (macOS):
    open -a Docker

Step 2: Build & Run Our Weather Pipeline

  1. Build the Docker image:

    docker build -t weather-pipeline .
  2. Run the container with your API key:

    docker run --env-file .env weather-pipeline
  3. Spin up services via Docker Compose:

    docker compose up

    If you get a port conflict (e.g., for port 9090), try:

    lsof -i :9090
    kill -9 <PID>
    pkill -f prometheus
    docker compose up
  4. Check Docker containers:

    docker ps

    Screenshot:
    Docker Compose Up

    You should see 3 containers:

    • Weather Pipeline (port 8000)
    • Prometheus (port 9090)
    • Grafana (port 3000)

Step 3: Monitor Pipeline with Grafana

Once Docker is up, open Grafana at http://localhost:3000.

  • Username: admin
  • Password: admin (by default, if unchanged)

Screenshot:
Grafana Dashboard

You’ll see panels for:

  • Pipeline Duration
  • Data Volumes (Records Processed & Data Points Extracted)
  • API Performance

If you see “No data,” check your prometheus.yml or the pipeline’s main logs to ensure metrics are being scraped properly.

Step 4: Using Airflow Locally

Screenshot (Login):
Airflow Login

  1. Install & Initialize Airflow:
    pip install apache-airflow
    airflow db init
  2. Create Admin User:
    airflow users create \
      --username admin \
      --password admin \
      --firstname Admin \
      --lastname User \
      --role Admin \
      --email [email protected]
  3. Add the DAG:
    mkdir -p ~/airflow/dags
    cp airflow/weather_pipeline_dag.py ~/airflow/dags/
  4. Start Airflow:
    airflow webserver --port 8080
    airflow scheduler
  5. Screenshot (DAGs):
    Airflow DAGs
    You should now see a list of DAGs (pipelines). Enable or trigger the relevant ones.

Step 5: Kubernetes (Optional)

  1. Start Minikube:
    minikube start
  2. Apply the Weather Pipeline Deployment:
    kubectl apply -f kubernetes/deployment.yaml
  3. (Optional) Create a secret for your API key:
    kubectl create secret generic weather-pipeline-secrets \
        --from-literal=API_KEY=your_openweathermap_api_key
  4. Check pods:
    kubectl get pods

Now your pipeline can run in a Kubernetes environment!


8. Processing Pipeline

graph TD
    A[Input Config] --> B[Data Extraction]
    B --> C[Data Transformation]
    C --> D[Data Loading]
    C --> E[Data Analysis]
    E --> F[Visualization]
    F --> G[Results]
Loading
  1. Extract: Grab weather data from OpenWeatherMap
  2. Transform: Clean, normalize, handle outliers
  3. Load: Store processed data in local files or DB
  4. Analyze: Generate city comparisons, identify trends
  5. Visualize: Plot charts & graphs (Matplotlib, etc.)

9. Data Analysis

mindmap
    root((Weather Analysis))
        City Comparisons
            Temperature
            Humidity
            Wind Speed
        Temporal Analysis
            Daily Variation
            Long-term Trend
        Weather Conditions
            Condition Distribution
            Alerts
        Correlation
            Temperature-Humidity
            Wind-Temperature
Loading

The pipeline can generate:

  • Time-series plots (temperature trends)
  • Comparison charts across multiple cities
  • Correlation analyses (humidity vs. temperature)

10. Deployment Options

  • Local Docker: docker-compose up --build
  • Kubernetes (Minikube): minikube start && kubectl apply -f deployment.yaml
  • Airflow: Local scheduler and UI (port 8080)
  • EC2: GitHub Actions CI/CD for continuous deployment

11. Monitoring

  1. Prometheus collects pipeline metrics (port 9090).
  2. Grafana visualizes metrics (port 3000).
    • Import monitoring/grafana-dashboard.json for a pre-built dashboard.
  3. Alerts can be configured in Prometheus/Grafana to notify on pipeline failures or anomalies.

12. References


13. License

© 2025 Fahmi Zainal

All rights reserved. This project and its contents are proprietary and 
confidential. Unauthorized copying, distribution, or modification of 
this software, via any medium, is strictly prohibited. For licensing 
inquiries, please contact the project maintainer.

About

The OpenWeatherMap Data Pipeline Engineering Project collects real-time weather data from multiple cities, processes it through a robust ETL pipeline, and generates insightful visualizations to reveal patterns and trends in temperature, humidity, and wind conditions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published