Skip to content

🌱 AI-Driven Greenhouse Climate Control Optimizing plant growth & energy efficiency using multi-objective optimization, predictive modeling, and IoT sensors. Built on DynaGrow, it enables smart climate adjustments via genetic algorithms & real-time data, reducing energy waste while ensuring optimal growing conditions. πŸš€πŸŒΏ

License

Notifications You must be signed in to change notification settings

Fnux8890/Proactive-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

97 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DOI

🌱 Advanced Greenhouse Climate Control & Optimization System

A high-performance, GPU-accelerated platform for optimizing greenhouse operations through multi-objective evolutionary algorithms, achieving 132x speedup and €15,000-50,000 annual savings per greenhouse.

🎯 Project Overview

This research project presents a comprehensive data-driven greenhouse climate control and optimization system that balances plant growth and energy efficiency through advanced simulation and multi-objective optimization. The system addresses critical computational bottlenecks in horticultural optimization by leveraging GPU acceleration, achieving breakthrough performance improvements while handling extreme data sparsity (91.3% missing values).

Key Achievements

  • 132x Performance Improvement: TensorNSGA3 GPU implementation vs traditional CPU NSGA-III (0.041s vs 5.41s per generation)
  • Superior Solution Quality: 26 Pareto-optimal solutions (GPU) vs 12 (CPU), providing better trade-offs
  • Economic Impact: Potential savings of €15,000-50,000 annually per greenhouse through optimized control strategies
  • Model Accuracy: RΒ² >0.85 for energy consumption, RΒ² >0.80 for plant growth predictions
  • Production-Ready: Complete Docker-based pipeline from raw data to optimized control strategies

πŸš€ Quick Start

Complete End-to-End Pipeline (Recommended)

# Clone the repository
git clone https://github.com/Fnux8890/Proactive-thesis.git
cd Proactive-thesis/DataIngestion

# Run the complete optimization pipeline (2-4 hours with GPU)
./run_full_pipeline_experiment.sh

# Or with custom date range
START_DATE="2014-01-01" END_DATE="2014-12-31" ./run_full_pipeline_experiment.sh

This will execute the entire pipeline including:

  • βœ… Data ingestion from greenhouse sensors (2013-2016 dataset)
  • βœ… Enhanced sparse feature extraction with GPU acceleration
  • βœ… Multi-level temporal feature engineering (223,825 features)
  • βœ… Surrogate model training (LightGBM + LSTM)
  • βœ… CPU vs GPU MOEA optimization comparison
  • βœ… Comprehensive performance analysis and reporting

Alternative Quick Experiments

# Multi-run statistical comparison
./run_multiple_experiments.sh --cpu-runs 5 --gpu-runs 5

# Quick performance test
./quick_performance_test.sh

# Development setup (CPU-only, minimal features)
docker compose up

πŸ—οΈ System Architecture

The system implements a sophisticated 6-stage pipeline optimized for handling sparse greenhouse data:

Stage 1: High-Performance Data Ingestion

  • Technology: Rust with async I/O (tokio + sqlx)
  • Performance: ~10,000 rows/second batch insertion
  • Input: CSV/JSON sensor data from multiple greenhouses
  • Output: TimescaleDB hypertable with validated sensor readings

Stage 2-4: Integrated Sparse Pipeline

  • Hybrid Architecture: Rust for CPU-bound operations, Python for GPU computations
  • Sparse Data Handling: Efficiently processes 91.3% missing values
  • Era Detection: PELT, BOCPD, and HMM algorithms for temporal segmentation
  • Feature Extraction:
    • GPU-accelerated statistical features (mean, std, percentiles)
    • Temporal patterns and cross-sensor correlations
    • Sparse-specific metrics (coverage, gap statistics)
  • Performance: ~1M samples/second in hybrid mode

Stage 5: Advanced Model Building

  • Surrogate Models:
    • LightGBM for fast inference
    • LSTM networks for temporal dynamics
  • GPU Training: PyTorch with mixed precision
  • MLflow Integration: Experiment tracking and model versioning
  • Accuracy: RΒ² >0.85 for all objectives

Stage 6: Multi-Objective Optimization

  • CPU Implementation: pymoo NSGA-III (baseline)
  • GPU Implementation: Custom TensorNSGA3 with CUDA acceleration
  • Objectives: Energy consumption, plant growth, resource efficiency
  • Performance Gains: 132x speedup with GPU implementation

πŸ“‹ System Requirements

Hardware Requirements

Component Minimum Recommended High-Performance
RAM 8GB 16GB 32GB+
CPU 4 cores 8 cores 16+ cores
GPU None (CPU-only) GTX 1660 (6GB) RTX 4070+ (12GB+)
Storage 20GB SSD 50GB SSD 100GB+ NVMe SSD

Software Requirements

# Essential
Docker >= 20.10
Docker Compose >= 2.0

# For GPU acceleration
NVIDIA Container Toolkit
CUDA >= 11.8
NVIDIA Driver >= 515

# Development tools (optional)
Python 3.9+
Rust 1.70+
PostgreSQL client tools

πŸ”§ Installation & Setup

1. Clone the Repository

git clone https://github.com/Fnux8890/Proactive-thesis.git
cd Proactive-thesis

2. Prepare Data Directory

The pipeline requires greenhouse sensor data in a specific structure:

# Required directory structure
Data/
β”œβ”€β”€ aarslev/
β”‚   β”œβ”€β”€ temperature_sunradiation_jan_feb_2014.json.csv
β”‚   β”œβ”€β”€ weather_jan_feb_2014.csv
β”‚   └── winter2014.csv
└── knudjepsen/
    └── [sensor_data_files].csv

Data Format Requirements:

  • Timestamps: ISO format datetime
  • Sensor columns: temperature, humidity, co2, light intensity, heating/ventilation/lamp states
  • Date range: 2013-12-01 to 2016-09-08 (for full dataset)
  • Format: CSV files with headers

3. GPU Setup (Optional but Recommended)

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

4. Configure Environment

cd DataIngestion
cp .env.example .env
# Edit .env with your specific settings

πŸ“– How to Use the Pipeline

Running Different Pipeline Configurations

1. Full Production Pipeline (Recommended for Research)

cd DataIngestion
./run_full_pipeline_experiment.sh

# Custom date range
START_DATE="2014-01-01" END_DATE="2014-12-31" ./run_full_pipeline_experiment.sh
  • Duration: 2-4 hours with GPU, 8-12 hours CPU-only
  • Output: Complete experiment results with CPU vs GPU comparison
  • Use case: Academic research, performance benchmarking

2. Development Mode (For Testing)

# Minimal features, CPU-only
docker compose up

# With development tools
docker compose --profile dev-tools up
  • Duration: 5-10 minutes per era
  • Output: Basic pipeline validation
  • Use case: Code development, debugging

3. Enhanced Sparse Pipeline (For Sparse Data)

docker compose -f docker-compose.enhanced.yml up
  • Duration: 1-2 hours
  • Output: Full feature extraction with sparse data handling
  • Use case: Real-world greenhouse data with missing values

4. Performance Testing

# Quick benchmark
./quick_performance_test.sh

# Statistical comparison (multiple runs)
./run_multiple_experiments.sh --cpu-runs 5 --gpu-runs 5

Configuration Options

Environment Variables

# Core settings
START_DATE="2013-12-01"          # Data start date
END_DATE="2016-09-08"            # Data end date
BATCH_SIZE="48"                  # Processing batch size
MIN_ERA_ROWS="200"               # Minimum era size

# Feature extraction
FEATURE_SET="comprehensive"       # minimal|efficient|comprehensive
USE_SPARSE_FEATURES="true"        # Enable sparse data handling
N_JOBS="4"                       # CPU parallelism

# GPU configuration
USE_GPU="true"                   # Enable GPU acceleration
CUDA_VISIBLE_DEVICES="0"         # GPU device ID
GPU_MEMORY_LIMIT="12GB"          # VRAM limit

# MOEA optimization
ALGORITHM_TYPE="tensornsga3"     # tensornsga3|nsga3_gpu|nsga3_cpu
POPULATION_SIZE="100"            # Population size
GENERATIONS="300"                # Number of generations

Docker Compose Files

File Purpose When to Use
docker-compose.yml Base configuration Always loaded
docker-compose.override.yml Development overrides Automatic in dev
docker-compose.enhanced.yml Enhanced sparse pipeline Production data
docker-compose.prod.yml Production with monitoring Cloud deployment
docker-compose.full-comparison.yml Complete CPU vs GPU Research experiments

Expected Output Structure

After successful pipeline execution:

DataIngestion/experiments/full_experiment/[timestamp]/
β”œβ”€β”€ experiment_summary.json          # Experiment metadata
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ stage3_features.json        # 223,825 extracted features
β”‚   └── stage4_eras.json           # Temporal segmentation results
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ energy_consumption_model.pt # PyTorch model
β”‚   β”œβ”€β”€ plant_growth_model.pt      # PyTorch model
β”‚   └── training_summary.json      # Performance metrics
β”œβ”€β”€ moea_cpu/                       # CPU optimization results
β”‚   └── pareto_F.npy               # Pareto front (12 solutions)
β”œβ”€β”€ moea_gpu/                       # GPU optimization results
β”‚   └── pareto_F.npy               # Pareto front (26 solutions)
└── evaluation_results/
    └── comprehensive_evaluation_report.json

Monitoring Progress

Real-time Logs

# View all services
docker compose logs -f

# Specific service
docker compose logs -f enhanced_sparse_pipeline

# GPU utilization
nvidia-smi -l 1

Web Interfaces (Production Mode)

Common Workflows

1. Process New Greenhouse Data

# Place data in Data/ directory
# Update .env with appropriate dates
# Run enhanced pipeline
docker compose -f docker-compose.enhanced.yml up

2. Reproduce Paper Results

# Use full dataset dates
START_DATE="2013-12-01" END_DATE="2016-09-08" \
./run_full_pipeline_experiment.sh

3. Benchmark GPU Performance

# Compare algorithms
./run_multiple_experiments.sh --cpu-runs 3 --gpu-runs 3

4. Debug Pipeline Issues

# Run individual stages
docker compose up db  # Start database
docker compose up rust_pipeline  # Test data ingestion
docker compose up enhanced_sparse_pipeline  # Test feature extraction

πŸ“Š Key Results & Performance Metrics

Computational Performance

Algorithm Hardware Time/Generation Speedup Solutions Found
NSGA-III (pymoo) CPU (16 cores) 5.41s 1x (baseline) 12
Custom GPU NSGA-III RTX 4070 0.235s 22.9x 18
TensorNSGA3 RTX 4070 0.041s 132x 26

Model Performance

Model Objective RMSE RΒ² MAE
LightGBM Energy Consumption 0.043 0.878 0.031
LightGBM Plant Growth 0.051 0.834 0.038
LSTM Energy (Temporal) 0.039 0.891 0.028
LSTM Growth (Temporal) 0.047 0.856 0.034

Economic Impact Analysis

The following economic projections are based on theoretical calculations using industry benchmarks and literature values:

  • Potential Annual Savings: €15,000-50,000 per greenhouse*
  • Yield Improvement: 8-15% through optimized control (based on literature)
  • ROI: 6-18 months for GPU hardware investment
  • Carbon Reduction: 20-35% through efficient operations

*Note: Savings estimate assumes:

  • Medium-sized commercial greenhouse (2,000-5,000 mΒ²)
  • Energy consumption of 200-400 kWh/mΒ²/year
  • Danish energy prices of €0.30-0.40/kWh
  • 10-20% energy reduction through optimization (based on literature showing 9% savings achievable)
  • Energy costs representing ~50% of operational expenses

These projections are theoretical and based on optimization potential demonstrated in academic literature. Actual savings will depend on specific greenhouse characteristics, local energy prices, and successful implementation of the optimization strategies.

πŸš€ Key Features & Innovations

Core Capabilities

βœ… Hybrid Rust+Python Architecture: Leverages Rust's performance for data ingestion and CPU-bound operations while utilizing Python's ecosystem for GPU acceleration and ML

βœ… Advanced Sparse Data Handling: Efficiently processes greenhouse data with 91.3% missing values through specialized algorithms and sparse-aware feature extraction

βœ… GPU-Accelerated MOEA: Custom TensorNSGA3 implementation achieving 132x speedup over traditional CPU approaches

βœ… Multi-Level Temporal Analysis: Extracts features at multiple time scales (hours/days, days/weeks, weeks/months) to capture plant growth dynamics

βœ… Production-Ready Pipeline: Complete Docker Compose orchestration from raw data to optimized control strategies

βœ… Comprehensive Monitoring: Integrated Prometheus + Grafana stack for real-time performance tracking

Technical Innovations

  • Sparse Feature Engineering: 223,825 specialized features designed for high-sparsity time series
  • Hybrid Processing Model: Optimal workload distribution between CPU and GPU resources
  • Surrogate Modeling: LightGBM + LSTM models for fast fitness evaluation in MOEA
  • Era Detection: Advanced changepoint detection (PELT, BOCPD, HMM) for temporal segmentation
  • Economic Optimization: Multi-objective balancing of energy costs, plant growth, and resource efficiency

πŸ“ Project Structure

Proactive-thesis/
β”œβ”€β”€ DataIngestion/              # Main pipeline implementation
β”‚   β”œβ”€β”€ rust_pipeline/          # Stage 1: High-performance data ingestion
β”‚   β”œβ”€β”€ gpu_feature_extraction/ # Stages 2-4: Hybrid sparse pipeline
β”‚   β”œβ”€β”€ model_builder/          # Stage 5: ML model training
β”‚   β”œβ”€β”€ moea_optimizer/         # Stage 6: Multi-objective optimization
β”‚   β”œβ”€β”€ experiments/            # Experiment results and analysis
β”‚   └── docs/                   # Comprehensive documentation
β”œβ”€β”€ Data/                       # Input greenhouse sensor data
β”œβ”€β”€ Docs/                       # Research documentation
β”œβ”€β”€ Doc-templates/              # Project specification templates
└── Jupyter/                    # Analysis notebooks

πŸ”¬ Research Contributions

This project advances the state-of-the-art in greenhouse optimization through:

  1. Computational Acceleration: First comprehensive study of GPU acceleration for horticultural MOEA optimization, achieving breakthrough 132x speedup

  2. Sparse Data Innovation: Novel hybrid pipeline architecture specifically designed for extreme data sparsity common in greenhouse environments

  3. Economic Validation: Demonstrated €15,000-50,000 annual savings potential through optimized control strategies

  4. Open-Source Framework: Complete, reproducible pipeline available for the research community

πŸ“– Documentation

Getting Started

Technical Deep Dives

Development

🀝 Contributing

We welcome contributions! Areas of particular interest:

  • Additional plant species models
  • Alternative MOEA algorithms
  • Cloud deployment optimizations
  • Real-time control integration
  • Additional sparse data handling techniques

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Aarhus University, Department of Electrical and Computer Engineering
  • Danish greenhouse facilities (KnudJepsen, Aarslev) for providing data
  • NVIDIA for GPU computing resources
  • Open-source communities (PyTorch, TimescaleDB, Docker)

πŸ“š Citation

If you use this work in your research, please cite:

@software{advanced_greenhouse_optimization_2025,
  author = {[Author Name]},
  title = {Advanced Greenhouse Climate Control & Optimization System},
  year = {2025},
  publisher = {GitHub},
  doi = {10.5281/zenodo.15571041},
  url = {https://github.com/Fnux8890/Proactive-thesis}
}

πŸ“§ Contact

For questions, collaboration, or support:


🌱 Contributing to sustainable agriculture through advanced computational optimization 🌿

About

🌱 AI-Driven Greenhouse Climate Control Optimizing plant growth & energy efficiency using multi-objective optimization, predictive modeling, and IoT sensors. Built on DynaGrow, it enables smart climate adjustments via genetic algorithms & real-time data, reducing energy waste while ensuring optimal growing conditions. πŸš€πŸŒΏ

Resources

License

Stars

Watchers

Forks

Packages

No packages published