AutoML is a powerful tool for automating the end-to-end process of applying machine learning to real-world problems. It simplifies the process of model selection, hyperparameter tuning, and downloading, making machine learning accessible to everyone.
Check out the live demo of AutoML and experience the power of automated machine learning firsthand!
See AutoML in action: This demonstration shows how to analyze data, train models, and get AI-powered insights in minutes!
-
๐ Data Visualization and Analysis: Interactive visualizations to understand your data
- Correlation heatmaps
- Distribution plots
- Feature importance charts
- Pair plots for relationship analysis
-
๐งน Automated Data Cleaning and Preprocessing: Handle missing values, outliers, and feature engineering
- Automatic detection and handling of missing values
- Outlier detection and treatment
- Feature scaling and normalization
- Categorical encoding (One-Hot, Label, Target encoding)
-
๐ค Multiple ML Model Selection: Choose from a variety of models or let AutoML select the best one
- Classification models: Logistic Regression, Random Forest, XGBoost, SVC, Decision Tree, KNN, Gradient Boosting, AdaBoost, Gaussian Naive Bayes, QDA, LDA
- Regression models: Linear Regression, Random Forest, XGBoost, SVR, Decision Tree, KNN, ElasticNet, Gradient Boosting, AdaBoost, Bayesian Ridge, Ridge, Lasso
-
โ๏ธ Hyperparameter Tuning: Optimize model performance with advanced tuning techniques
- Added Support for 20+ Models to easily fine tune hyperparameters
- Added Support for 10+ Hyperparameter Tuning Techniques
-
๐ Model Performance Evaluation: Comprehensive metrics and visualizations
- Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC, Confusion Matrix
- Regression: MAE, MSE, RMSE, Rยฒ, Residual Plots
-
๐ AI-powered Data Insights: Leverage Google's Gemini for intelligent data analysis
- Natural language explanations of model decisions
- Automated feature importance interpretation
- Data quality assessment
- Trend identification and anomaly detection
-
๐ง LLM Fine-Tuning and Download: Access and utilize pre-trained language models
- Download fine-tuned LLMs for specific domains
- Customize existing models for your specific use case
- Access to various model sizes (small, medium, large)
- Seamless integration with your data processing pipeline
- Python 3.8 or higher
- Google API key for Gemini for data insights and dataframe cleaning
- Groq API key for LLM based test results analysis
- langsmith API for monitoring llm calls
- Clone the repository:
git clone <https://github.com/kashh56/AutoML>
cd Auto-ML
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up your environment variables:
# Create a .env file with your Google API key as well as other keys
echo "GOOGLE_API_KEY=your_api_key_here" > .env
Start the application:
streamlit run app.py
-
Upload Data: Upload your CSV file
- Supported format: CSV
- Automatic data type detection
- Preview of first few rows
-
Explore Data: Visualize and understand your dataset
- Summary statistics
- Correlation analysis
- Distribution visualization
- Missing value analysis
-
Preprocess: Clean and transform your data
- Handle missing values (imputation strategies)
- Remove or transform outliers
- Feature scaling options
- Encoding categorical variables
-
Train Models: Select models and tune hyperparameters
- Choose target variable and features
- Select machine learning algorithms
- Configure hyperparameter search space
- Set evaluation metrics
-
Evaluate: Compare model performance
- Performance metrics visualization
- Feature importance analysis
- Model comparison dashboard
- Cross-validation results
-
Deploy: Export your model
- Download trained model as pickle file
Auto-ML/
โโโ app.py # Main Streamlit application
โโโ requirements.txt # Project dependencies
โโโ .env # Environment variables (API keys)
โโโ README.md # Project documentation
โโโ models/ # Saved model files
โโโ logs/ # Application logs
โโโ src/ # Source code
โโโ __init__.py # Package initialization
โโโ preprocessing/ # Data preprocessing modules
โ โโโ __init__.py
โ โโโ ... # Data cleaning, transformation
โโโ training/ # Model training modules
โ โโโ __init__.py
โ โโโ ... # Model training, evaluation
โโโ ui/ # User interface components
โ โโโ __init__.py
โ โโโ ... # Streamlit UI elements
โโโ utils/ # Utility functions
โโโ __init__.py
โโโ ... # Helper functions
Purpose: Collects raw data from multiple sources (CSV, databases, APIs).
- Reads structured/unstructured data
- Handles missing values and duplicates
- Converts raw data into a clean DataFrame
Purpose: Transforms raw data into a machine-learning-ready format.
- Cleans Data: Handles NaNs, outliers, and standardizes columns
- Encodes Categorical Features: One-hot encoding, label encoding
- Scales Numerical Data: MinMaxScaler, StandardScaler
Purpose: Automates the process of selecting and training.
- Multiple Algorithms: Trains XGBoost, RandomForest, Deep Learning models
- Hyperparameter Optimization: Finds the best config for each model
Purpose: Makes the model available for real-world usage.
- Exports the Model (Pickle, ONNX, TensorFlow SavedModel)
- Easily Download after training
AutoML implements a robust feedback and fallback system to ensure reliability:
-
Data Cleaning Validation: The system validates all cleaning operations and provides feedback on the changes made
- Automatic detection of cleaning effectiveness
- Detailed logs of transformations applied to the data
-
LLM Fallback Mechanism: For AI-powered insights and data analysis
- Primary attempt uses advanced LLMs (Google Gemini/Groq)
- Automatic fallback to rule-based algorithms if LLM fails
- Graceful degradation to ensure core functionality remains available
- Error logging and reporting for continuous improvement
- LangSmith integration for monitoring and tracking all LLM calls
-
Error Feedback Loop: Intelligent error handling during data cleaning
- Automatically captures errors that occur during data cleaning operations
- Sends error context to LLM to generate refined cleaning code
- Re-executes the improved cleaning process
- Iterative refinement ensures robust data preparation even with challenging datasets
We welcome contributions!
- Fork the repository
- Create a feature branch
- Install development dependencies:
pip install -r requirements-dev.txt
- Make your changes
- Run tests:
pytest
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Streamlit for the interactive web framework
- Scikit-learn for machine learning algorithms
- Pandas for data manipulation
- Plotly for interactive visualizations
- Google Gemini for AI-powered insights
- XGBoost for gradient boosting
- Seaborn for statistical visualizations
- LangChain for large language model integration
- LangSmith for LLM call tracking and monitoring
- Groq for high-performance computing
Made with โค๏ธ by Akash Anandani