A comprehensive, MongoDB-first Streamlit web application for analyzing and predicting climate change impacts on global agriculture. Built to showcase the power of MongoDB aggregation pipelines for processing large-scale datasets without loading data into memory.
AgriIntel is a data-driven platform that helps researchers, policymakers, and farmers understand and adapt to climate change impacts on agriculture through:
- 🔥 MongoDB-First Architecture: All data operations use MongoDB queries and aggregations (30+ pipelines)
- 📊 Big Data Processing: Handle millions of records without memory issues
- 🤖 Advanced Analytics: Exploratory analysis, risk assessment, and ML forecasting
- 📈 Predictive Modeling: ML-powered yield forecasting through 2050
- 🎨 Beautiful UI: Professional design with interactive visualizations
- 💾 Full CRUD: Complete database management with MongoDB operations
- ⚡ High Performance: Indexed queries, aggregation pipelines, zero data loading
Traditional approach (loading all data to pandas):
df = pd.read_csv('huge_file.csv') # 500MB+ in memory ❌
grouped = df.groupby('Country')['Yield'].mean() # Slow, not scalable ❌Our approach (MongoDB aggregation):
pipeline = [
{'$group': {'_id': '$Country', 'avg_yield': {'$avg': '$Crop_Yield_MT_per_HA'}}},
{'$sort': {'avg_yield': -1}}
]
results = handler.aggregate('climate_agriculture_data', pipeline) # Fast, scalable ✅Benefits:
- ✅ Process billions of records
- ✅ Millisecond query performance
- ✅ Zero memory overhead
- ✅ Production-ready scalability
project/
│
├── app.py # Main launcher with navigation
├── db_connection.py # MongoDB connection handler
│
├── pages/
│ ├── 1_📊_EDA.py # Exploratory Data Analysis
│ ├── 2_🌪️_Extreme_Weather.py # Extreme Weather Risk Analysis
│ ├── 3_📈_Forecasting.py # Time-series & ML Prediction
│ ├── 4_🔬_Correlation_Lab.py # Correlation Analysis
│ ├── 5_🧠_Adaptation.py # Strategy Simulation
│ ├── 6_🤖_Farmer_Assistant.py # Yield Prediction Assistant
│ ├── 7_🗂️_Admin_Panel.py # MongoDB CRUD & Uploads
│ └── 8_💾_MongoDB_Analytics.py # Advanced MongoDB Operations Hub ⭐
│
├── models/
│ └── (trained models saved here)
│
├── data/
│ └── climate_change_impact_on_agriculture_2024.csv
│
├── requirements.txt
└── README.md
- Streamlit - Web framework
- Plotly - Interactive visualizations
- Folium - Geographic mapping
- streamlit-option-menu - Enhanced navigation
- Python 3.8+
- Pandas & NumPy - Data processing
- scikit-learn - Machine learning
- Prophet - Time-series forecasting
- PyMongo - MongoDB driver
- MongoDB - NoSQL database for climate data storage
- Python 3.8 or higher
- MongoDB installed and running locally
- Download from: https://www.mongodb.com/try/download/community
- Or install via package manager:
# macOS brew tap mongodb/brew brew install mongodb-community # Ubuntu sudo apt-get install mongodb # Windows: Download installer from MongoDB website
# If using git
git clone <repository-url>
cd agri-intel
# Or extract downloaded zip file
unzip agri-intel.zip
cd agri-intel# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txt# Start MongoDB service
# On macOS:
brew services start mongodb-community
# On Ubuntu:
sudo systemctl start mongod
# On Windows: Start MongoDB service from Services panel
# Or run: mongod --dbpath <path-to-data-directory>Verify MongoDB is running:
# Connect using MongoDB Compass or mongo shell
mongosh
# Should connect without errors-
Place your CSV file in the
data/directory:data/climate_change_impact_on_agriculture_2024.csv -
Required CSV columns (minimum):
CountryYearCrop_TypeCrop_Yield_MT_per_HAAverage_Temperature_CTotal_Rainfall_mmCO2_Emissions_MT
-
Optional columns for enhanced features:
Fertilizer_Use_KG_per_HAPesticide_Use_KG_per_HAIrrigation_Access_PctExtreme_Weather_EventsPrecipitation_Anomaly_mm
streamlit run app.pyThe application will open in your browser at http://localhost:8501
- Navigate to Admin Panel (page 7)
- Upload your CSV file to MongoDB
- Wait for data import to complete
- Explore other modules - data is now available across all pages
- Project overview and mission
- Key performance indicators (KPIs)
- Global metrics visualization
- Quick navigation to all modules
- Filters: Country, crop, year range
- Visualizations: Trends, distributions, geographic maps
- MongoDB Aggregations:
get_time_series_aggregation()- Yearly trends via$groupget_country_rankings()- Top countries via aggregationget_filtered_statistics()- Real-time stats via$avg,$stdDevPop- Nested grouping for complex regional analysis
- Export: Download filtered data as CSV
Key MongoDB Pipeline Example:
pipeline = [
{'$match': {'Country': 'India'}},
{'$group': {
'_id': '$Year',
'avg_yield': {'$avg': '$Crop_Yield_MT_per_HA'},
'count': {'$sum': 1}
}},
{'$sort': {'_id': 1}}
]- Risk Index Calculation: Weighted climate anomaly scores
- Geographic Risk Maps: Identify vulnerable regions
- Trend Analysis: Risk evolution over time
- MongoDB Aggregations:
get_extreme_weather_analysis()- Nested$groupfor variance calculation- Temperature/rainfall anomaly buckets via
$bucket - Multi-stage pipelines with
$stdDevPop
- Automated Insights: Top affected regions
Key MongoDB Pipeline Example:
pipeline = [
{'$group': {
'_id': {'country': '$Country', 'year': '$Year'},
'avg_temp': {'$avg': '$Average_Temperature_C'}
}},
{'$group': {
'_id': '$_id.country',
'temp_variance': {'$stdDevPop': '$avg_temp'},
'years_tracked': {'$sum': 1}
}},
{'$sort': {'temp_variance': -1}}
]- ML Model: Random Forest Regressor
- Predictions: Yield forecasts through 2050
- Confidence Intervals: 95% prediction bands
- Scenario Analysis: Test climate change scenarios
- MongoDB: Load training data via aggregation, save predictions
- Pairwise Correlation: Pearson/Spearman methods
- Correlation Matrix: Full heatmap visualization
- Multiple Variables: Compare predictors
- MongoDB: Efficient data sampling and filtering
- Climate Scenarios: IPCC RCP 2.6, 4.5, 8.5
- Adaptation Measures: Irrigation, fertilizer, technology
- Comparison: With/without adaptation analysis
- Economic Impact: Revenue projections
- MongoDB: Baseline statistics via aggregation
- Input Form: Region, crop, climate conditions
- AI Predictions: Expected yield and risk category
- Personalized Advice: Adaptive strategies based on inputs
- Export Reports: Download prediction reports
- MongoDB: Train ML model from aggregated data
- Query Builder: Build and execute MongoDB queries
- Simple match, range queries, complex conditions, custom JSON
- Aggregation Pipelines: 10+ pre-built pipelines
- Yearly trends, country rankings, crop comparisons
- Climate impact bucketing, multi-stage operations
- Statistical Analysis: Compute stats in MongoDB
- Descriptive statistics, variance analysis, percentiles
- Geospatial Queries: Country-level aggregations
- Performance Metrics: Query benchmarking and optimization
- Custom Pipelines: Execute your own JSON pipelines
Featured Pipelines:
- Yearly yield trends with
$groupand$avg - Country performance with
$addToSetand$size - Crop comparison with
$stdDevPopand coefficient of variation - Climate bucketing with
$bucket - Complex multi-stage with
$match,$addFields,$switch
- Upload CSV: Bulk insert via
insert_dataframe() - View Collections: Browse data with
query_to_dataframe() - Search & Filter: MongoDB regex and range queries
- Delete Records: Safe deletion with
delete_documents() - Statistics: Collection stats via
get_collection_stats() - Index Management: Create indexes for performance
Edit db_connection.py to change MongoDB settings:
# Default connection
uri = "mongodb://localhost:27017/"
db_name = "agri_intel"
# Custom connection
uri = "mongodb://username:password@host:port/"
db_name = "your_database_name"For large datasets (>100K records):
-
Limit data loading in
load_climate_data():df = load_climate_data(limit=50000) # Adjust as needed
-
Create indexes in Admin Panel for faster queries
-
Use filters in EDA to reduce data processing
Error: ServerSelectionTimeoutError
- Solution: Ensure MongoDB is running:
brew services listorsudo systemctl status mongod - Check connection URI in
db_connection.py
Error: ModuleNotFoundError
- Solution: Reinstall requirements:
pip install -r requirements.txt --upgrade
Issue: App is slow with large dataset
- Solution:
- Reduce data limit in cache functions
- Create MongoDB indexes (Admin Panel)
- Use date range filters in EDA
Error: Upload to MongoDB fails
- Solution:
- Check CSV format and encoding (UTF-8 recommended)
- Ensure column names match expected format
- Verify MongoDB connection is active
- Upload Data → Admin Panel
- Explore Patterns → EDA module
- Assess Risks → Extreme Weather module
- Make Predictions → Forecasting module
- Export Results → Download CSVs and reports
- Get Prediction → Farmer Assistant
- Input Farm Details → Climate and soil conditions
- Review Advice → Adaptive strategies
- Download Report → Save recommendations
- Identify Vulnerable Regions → Extreme Weather
- Forecast Future Impacts → Forecasting
- Analyze Correlations → EDA
- Export Insights → Generate reports
- Local Storage: All data stored in local MongoDB instance
- No Cloud Upload: Data never leaves your machine
- Backup Recommended: Regular MongoDB backups for production use
Modules planned for future versions:
- Correlation Lab: Interactive correlation matrix builder
- Adaptation Strategy Simulator: Test "what-if" scenarios
- Real-time Data: Integration with live weather APIs
- Multi-user Support: Authentication and role-based access
- Mobile App: React Native companion app
This project is created for educational and research purposes. Feel free to modify and extend for your needs.
Contributions welcome! Areas for improvement:
- Additional ML models (LSTM, XGBoost)
- Enhanced visualizations
- Real-time data integration
- Performance optimizations
- Documentation improvements
For issues or questions:
- Check Troubleshooting section
- Review MongoDB and Streamlit documentation
- Create an issue in the repository
- Streamlit - Amazing framework for data apps
- MongoDB - Flexible database solution
- Plotly - Beautiful interactive charts
- scikit-learn - Powerful ML library
# Start app
streamlit run app.py
# Start MongoDB
brew services start mongodb-community # macOS
sudo systemctl start mongod # Linux
# Update dependencies
pip install -r requirements.txt --upgrade
# Clear Streamlit cache
# Use "Clear cache" button in app or restart appapp.py- Main entry pointdb_connection.py- Database utilitiespages/- Individual modulesrequirements.txt- Dependencies
Built with ❤️ for Climate-AI Research 2025
🌱 Data-driven insights for sustainable agriculture in a changing climate