Skip to content

AgriIntel is a data-driven platform that helps researchers, policymakers, and farmers understand and adapt to climate change impacts on agriculture.

License

Notifications You must be signed in to change notification settings

vivek081202/AgriIntel

Repository files navigation

🌾 AgriIntel: Climate Impact Analytics Platform

💾 MongoDB-First Big Data Architecture

A comprehensive, MongoDB-first Streamlit web application for analyzing and predicting climate change impacts on global agriculture. Built to showcase the power of MongoDB aggregation pipelines for processing large-scale datasets without loading data into memory.

🎯 Project Overview

AgriIntel is a data-driven platform that helps researchers, policymakers, and farmers understand and adapt to climate change impacts on agriculture through:

  • 🔥 MongoDB-First Architecture: All data operations use MongoDB queries and aggregations (30+ pipelines)
  • 📊 Big Data Processing: Handle millions of records without memory issues
  • 🤖 Advanced Analytics: Exploratory analysis, risk assessment, and ML forecasting
  • 📈 Predictive Modeling: ML-powered yield forecasting through 2050
  • 🎨 Beautiful UI: Professional design with interactive visualizations
  • 💾 Full CRUD: Complete database management with MongoDB operations
  • ⚡ High Performance: Indexed queries, aggregation pipelines, zero data loading

🌟 Why MongoDB-First?

Traditional approach (loading all data to pandas):

df = pd.read_csv('huge_file.csv')  # 500MB+ in memory ❌
grouped = df.groupby('Country')['Yield'].mean()  # Slow, not scalable ❌

Our approach (MongoDB aggregation):

pipeline = [
    {'$group': {'_id': '$Country', 'avg_yield': {'$avg': '$Crop_Yield_MT_per_HA'}}},
    {'$sort': {'avg_yield': -1}}
]
results = handler.aggregate('climate_agriculture_data', pipeline)  # Fast, scalable ✅

Benefits:

  • ✅ Process billions of records
  • ✅ Millisecond query performance
  • ✅ Zero memory overhead
  • ✅ Production-ready scalability

🏗️ Project Architecture

project/
│
├── app.py                       # Main launcher with navigation
├── db_connection.py             # MongoDB connection handler
│
├── pages/
│   ├── 1_📊_EDA.py              # Exploratory Data Analysis
│   ├── 2_🌪️_Extreme_Weather.py  # Extreme Weather Risk Analysis
│   ├── 3_📈_Forecasting.py       # Time-series & ML Prediction
│   ├── 4_🔬_Correlation_Lab.py   # Correlation Analysis
│   ├── 5_🧠_Adaptation.py        # Strategy Simulation
│   ├── 6_🤖_Farmer_Assistant.py  # Yield Prediction Assistant
│   ├── 7_🗂️_Admin_Panel.py       # MongoDB CRUD & Uploads
│   └── 8_💾_MongoDB_Analytics.py # Advanced MongoDB Operations Hub ⭐
│
├── models/
│   └── (trained models saved here)
│
├── data/
│   └── climate_change_impact_on_agriculture_2024.csv
│
├── requirements.txt
└── README.md

⚙️ Tech Stack

Frontend

  • Streamlit - Web framework
  • Plotly - Interactive visualizations
  • Folium - Geographic mapping
  • streamlit-option-menu - Enhanced navigation

Backend

  • Python 3.8+
  • Pandas & NumPy - Data processing
  • scikit-learn - Machine learning
  • Prophet - Time-series forecasting
  • PyMongo - MongoDB driver

Database

  • MongoDB - NoSQL database for climate data storage

🚀 Installation & Setup

Prerequisites

  1. Python 3.8 or higher
  2. MongoDB installed and running locally
    • Download from: https://www.mongodb.com/try/download/community
    • Or install via package manager:
      # macOS
      brew tap mongodb/brew
      brew install mongodb-community
      
      # Ubuntu
      sudo apt-get install mongodb
      
      # Windows: Download installer from MongoDB website

Step 1: Clone/Download Project

# If using git
git clone <repository-url>
cd agri-intel

# Or extract downloaded zip file
unzip agri-intel.zip
cd agri-intel

Step 2: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Start MongoDB

# Start MongoDB service
# On macOS:
brew services start mongodb-community

# On Ubuntu:
sudo systemctl start mongod

# On Windows: Start MongoDB service from Services panel
# Or run: mongod --dbpath <path-to-data-directory>

Verify MongoDB is running:

# Connect using MongoDB Compass or mongo shell
mongosh
# Should connect without errors

Step 5: Prepare Dataset

  1. Place your CSV file in the data/ directory:

    data/climate_change_impact_on_agriculture_2024.csv
    
  2. Required CSV columns (minimum):

    • Country
    • Year
    • Crop_Type
    • Crop_Yield_MT_per_HA
    • Average_Temperature_C
    • Total_Rainfall_mm
    • CO2_Emissions_MT
  3. Optional columns for enhanced features:

    • Fertilizer_Use_KG_per_HA
    • Pesticide_Use_KG_per_HA
    • Irrigation_Access_Pct
    • Extreme_Weather_Events
    • Precipitation_Anomaly_mm

🎮 Running the Application

Start the App

streamlit run app.py

The application will open in your browser at http://localhost:8501

First Time Setup

  1. Navigate to Admin Panel (page 7)
  2. Upload your CSV file to MongoDB
  3. Wait for data import to complete
  4. Explore other modules - data is now available across all pages

📚 Module Guide

1. 🏠 Home Dashboard

  • Project overview and mission
  • Key performance indicators (KPIs)
  • Global metrics visualization
  • Quick navigation to all modules

2. 📊 EDA (Exploratory Data Analysis) ⭐ MongoDB-Powered

  • Filters: Country, crop, year range
  • Visualizations: Trends, distributions, geographic maps
  • MongoDB Aggregations:
    • get_time_series_aggregation() - Yearly trends via $group
    • get_country_rankings() - Top countries via aggregation
    • get_filtered_statistics() - Real-time stats via $avg, $stdDevPop
    • Nested grouping for complex regional analysis
  • Export: Download filtered data as CSV

Key MongoDB Pipeline Example:

pipeline = [
    {'$match': {'Country': 'India'}},
    {'$group': {
        '_id': '$Year',
        'avg_yield': {'$avg': '$Crop_Yield_MT_per_HA'},
        'count': {'$sum': 1}
    }},
    {'$sort': {'_id': 1}}
]

3. 🌪️ Extreme Weather Risk Analysis ⭐ MongoDB-Powered

  • Risk Index Calculation: Weighted climate anomaly scores
  • Geographic Risk Maps: Identify vulnerable regions
  • Trend Analysis: Risk evolution over time
  • MongoDB Aggregations:
    • get_extreme_weather_analysis() - Nested $group for variance calculation
    • Temperature/rainfall anomaly buckets via $bucket
    • Multi-stage pipelines with $stdDevPop
  • Automated Insights: Top affected regions

Key MongoDB Pipeline Example:

pipeline = [
    {'$group': {
        '_id': {'country': '$Country', 'year': '$Year'},
        'avg_temp': {'$avg': '$Average_Temperature_C'}
    }},
    {'$group': {
        '_id': '$_id.country',
        'temp_variance': {'$stdDevPop': '$avg_temp'},
        'years_tracked': {'$sum': 1}
    }},
    {'$sort': {'temp_variance': -1}}
]

4. 📈 Forecasting

  • ML Model: Random Forest Regressor
  • Predictions: Yield forecasts through 2050
  • Confidence Intervals: 95% prediction bands
  • Scenario Analysis: Test climate change scenarios
  • MongoDB: Load training data via aggregation, save predictions

5. 🔬 Correlation Lab

  • Pairwise Correlation: Pearson/Spearman methods
  • Correlation Matrix: Full heatmap visualization
  • Multiple Variables: Compare predictors
  • MongoDB: Efficient data sampling and filtering

6. 🧠 Adaptation Strategy Simulator

  • Climate Scenarios: IPCC RCP 2.6, 4.5, 8.5
  • Adaptation Measures: Irrigation, fertilizer, technology
  • Comparison: With/without adaptation analysis
  • Economic Impact: Revenue projections
  • MongoDB: Baseline statistics via aggregation

7. 🤖 Farmer Assistant

  • Input Form: Region, crop, climate conditions
  • AI Predictions: Expected yield and risk category
  • Personalized Advice: Adaptive strategies based on inputs
  • Export Reports: Download prediction reports
  • MongoDB: Train ML model from aggregated data

8. 💾 MongoDB Analytics Hub ⭐ PRIMARY SHOWCASE

  • Query Builder: Build and execute MongoDB queries
    • Simple match, range queries, complex conditions, custom JSON
  • Aggregation Pipelines: 10+ pre-built pipelines
    • Yearly trends, country rankings, crop comparisons
    • Climate impact bucketing, multi-stage operations
  • Statistical Analysis: Compute stats in MongoDB
    • Descriptive statistics, variance analysis, percentiles
  • Geospatial Queries: Country-level aggregations
  • Performance Metrics: Query benchmarking and optimization
  • Custom Pipelines: Execute your own JSON pipelines

Featured Pipelines:

  • Yearly yield trends with $group and $avg
  • Country performance with $addToSet and $size
  • Crop comparison with $stdDevPop and coefficient of variation
  • Climate bucketing with $bucket
  • Complex multi-stage with $match, $addFields, $switch

9. 🗂️ Admin Panel ⭐ MongoDB CRUD Operations

  • Upload CSV: Bulk insert via insert_dataframe()
  • View Collections: Browse data with query_to_dataframe()
  • Search & Filter: MongoDB regex and range queries
  • Delete Records: Safe deletion with delete_documents()
  • Statistics: Collection stats via get_collection_stats()
  • Index Management: Create indexes for performance

🔧 Configuration

MongoDB Connection

Edit db_connection.py to change MongoDB settings:

# Default connection
uri = "mongodb://localhost:27017/"
db_name = "agri_intel"

# Custom connection
uri = "mongodb://username:password@host:port/"
db_name = "your_database_name"

Performance Tuning

For large datasets (>100K records):

  1. Limit data loading in load_climate_data():

    df = load_climate_data(limit=50000)  # Adjust as needed
  2. Create indexes in Admin Panel for faster queries

  3. Use filters in EDA to reduce data processing


🐛 Troubleshooting

MongoDB Connection Issues

Error: ServerSelectionTimeoutError

  • Solution: Ensure MongoDB is running: brew services list or sudo systemctl status mongod
  • Check connection URI in db_connection.py

Missing Dependencies

Error: ModuleNotFoundError

  • Solution: Reinstall requirements: pip install -r requirements.txt --upgrade

Slow Performance

Issue: App is slow with large dataset

  • Solution:
    • Reduce data limit in cache functions
    • Create MongoDB indexes (Admin Panel)
    • Use date range filters in EDA

Data Upload Fails

Error: Upload to MongoDB fails

  • Solution:
    • Check CSV format and encoding (UTF-8 recommended)
    • Ensure column names match expected format
    • Verify MongoDB connection is active

📊 Sample Workflow

For Researchers

  1. Upload Data → Admin Panel
  2. Explore Patterns → EDA module
  3. Assess Risks → Extreme Weather module
  4. Make Predictions → Forecasting module
  5. Export Results → Download CSVs and reports

For Farmers

  1. Get Prediction → Farmer Assistant
  2. Input Farm Details → Climate and soil conditions
  3. Review Advice → Adaptive strategies
  4. Download Report → Save recommendations

For Policy Makers

  1. Identify Vulnerable Regions → Extreme Weather
  2. Forecast Future Impacts → Forecasting
  3. Analyze Correlations → EDA
  4. Export Insights → Generate reports

🔒 Data Security

  • Local Storage: All data stored in local MongoDB instance
  • No Cloud Upload: Data never leaves your machine
  • Backup Recommended: Regular MongoDB backups for production use

🚀 Future Enhancements

Modules planned for future versions:

  • Correlation Lab: Interactive correlation matrix builder
  • Adaptation Strategy Simulator: Test "what-if" scenarios
  • Real-time Data: Integration with live weather APIs
  • Multi-user Support: Authentication and role-based access
  • Mobile App: React Native companion app

📝 License

This project is created for educational and research purposes. Feel free to modify and extend for your needs.


🤝 Contributing

Contributions welcome! Areas for improvement:

  • Additional ML models (LSTM, XGBoost)
  • Enhanced visualizations
  • Real-time data integration
  • Performance optimizations
  • Documentation improvements

📧 Support

For issues or questions:

  1. Check Troubleshooting section
  2. Review MongoDB and Streamlit documentation
  3. Create an issue in the repository

🙏 Acknowledgments

  • Streamlit - Amazing framework for data apps
  • MongoDB - Flexible database solution
  • Plotly - Beautiful interactive charts
  • scikit-learn - Powerful ML library

📌 Quick Reference

Common Commands

# Start app
streamlit run app.py

# Start MongoDB
brew services start mongodb-community  # macOS
sudo systemctl start mongod            # Linux

# Update dependencies
pip install -r requirements.txt --upgrade

# Clear Streamlit cache
# Use "Clear cache" button in app or restart app

Key Files

  • app.py - Main entry point
  • db_connection.py - Database utilities
  • pages/ - Individual modules
  • requirements.txt - Dependencies

Built with ❤️ for Climate-AI Research 2025

🌱 Data-driven insights for sustainable agriculture in a changing climate

About

AgriIntel is a data-driven platform that helps researchers, policymakers, and farmers understand and adapt to climate change impacts on agriculture.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages