A machine learning project that predicts agricultural yields based on weather and environmental data using XGBoost regression. This project demonstrates the application of gradient boosting algorithms for agricultural forecasting, which is crucial for food security and agricultural planning.
This project implements a complete machine learning pipeline for predicting agricultural yields using weather data. It features:
- Data-driven approach: Uses real-world weather and agricultural data
- XGBoost implementation: Leverages gradient boosting for accurate predictions
- Model persistence: Save and load trained models for future predictions
- Performance monitoring: Comprehensive evaluation metrics and resource monitoring
- Production-ready: Command-line interface with configurable parameters
-
Training Pipeline (
XGBoost.py
):- Automated data preprocessing and feature selection
- Hyperparameter configuration with early stopping
- Model evaluation with multiple metrics (RMSE, MAE, R², MSE)
- Feature importance visualization
- Resource usage monitoring (memory, CPU, training time)
- Optional grid search for hyperparameter optimization
-
Prediction Pipeline (
predict_xgboost.py
):- Load pre-trained models for inference
- Batch prediction on new datasets
- Feature validation and reordering
- Export predictions to CSV
pandas
scikit-learn
xgboost
matplotlib
psutil
Install dependencies:
pip install -r requirements.txt
Basic training with default parameters:
python XGBoost.py --data weather_agriculture_combined_cleand\ \(1\).csv
Advanced training with custom parameters:
python XGBoost.py --data your_dataset.csv --test_size 0.25 --early_stop 15
python predict_xgboost.py --model xgboost_model.json --data new_data.csv --output predictions.csv
The model provides comprehensive performance metrics:
- RMSE: Root Mean Square Error for prediction accuracy
- MAE: Mean Absolute Error for average prediction deviation
- R²: Coefficient of determination for explained variance
- MSE: Mean Square Error for loss quantification
- Inference Time: Speed of model predictions
XGBoost/
├── XGBoost.py # Main training script
├── predict_xgboost.py # Prediction script
├── weather_agriculture_combined_cleand (1).csv # Dataset (~276KB)
├── xgboost_model.json # Trained model (saved after training)
├── requirements.txt # Python dependencies
└── README.md # Project documentation
The project uses a cleaned weather and agriculture combined dataset containing:
- Size: ~276KB of processed data
- Source: Official weather and agricultural sources
- Features: Numeric weather and environmental variables
- Target: Agricultural yield values (
value
column)
- Gradient Boosting: XGBoost regression with optimized parameters
- Early Stopping: Prevents overfitting during training
- Feature Selection: Automatic numeric feature detection
- Model Persistence: JSON format for cross-platform compatibility
- Resource Monitoring: Track memory usage and training time
- Visualization: Feature importance plots for model interpretation
- Learning Rate: 0.1
- Max Depth: 6
- Subsample: 0.8
- Column Sample by Tree: 0.8
- Objective: Squared error regression
This project demonstrates practical machine learning applications in:
- Agricultural Planning: Yield forecasting for crop management
- Food Security: Predicting harvest outcomes for supply chain planning
- Risk Assessment: Understanding weather impact on agricultural production
- Decision Support: Data-driven insights for farmers and policymakers
The trained model provides:
- Feature importance rankings to identify key weather factors
- Quantitative performance metrics for model reliability
- Resource usage statistics for deployment planning
- Visualization tools for model interpretation
Potential improvements and extensions:
- Multi-target prediction for different crop types
- Time series forecasting with seasonal patterns
- Integration with real-time weather APIs
- Web interface for interactive predictions
- Ensemble methods combining multiple algorithms
This project is open source and available for educational and research purposes.
This project showcases practical machine learning implementation for agricultural prediction, demonstrating proficiency in data science, model development, and production-ready code structure.