This repository implements a comprehensive Post-Training Quantization (PTQ) and model export pipeline for PyTorch models. Designed with a universal, modular architecture, it provides an end-to-end solution from model fine-tuning to quantized deployment.
Current Implementation: The pipeline is demonstrated with MobileNetV3, serving as a reference implementation that showcases the framework's capabilities for computer vision tasks.
Future Vision: The architecture is model-agnostic by design, with plans to extend support to additional model families and enhance quantization strategies. This project bridges the gap between research experimentation and production deployment, offering researchers and engineers a robust foundation for model optimization workflows.
Note: Read TODO List for more details of future version.
Create a single, config-driven pipeline that works across any PyTorch model architecture, eliminating the need for model-specific optimization code.
Bridge the gap between research and deployment with an end-to-end workflow: fine-tuning → quantization (PTQ-focused) → export → deployment analysis.
git clone <repository-url>
cd ModelLite
conda create -n model_lite python=3.12
conda activate model_lite
pip install -r requirements.txtRun this command to install PyTorch 2.7.1+cu118:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118Note: The reason to use cu118 is just limited support of my GPU GeForce MX350.
# Default mode.
python main.py
# Debug mode.
python main.py run=debug
# Edit configuration.
python main.py data.batch_size=32 training.num_epochs=10Later this will be updated...
- Model-Agnostic Framework - Works seamlessly with any PyTorch model architecture
- Production-Ready Quantization - Battle-tested techniques with comprehensive validation
- Research-Friendly Design - Modular architecture for easy experimentation and extension
- Multi-Strategy Quantization - Support for FX Graph, Static, and Dynamic quantization
- Zero-Boilerplate Integration - Minimal code changes required for existing models
- Configurable Precision - Flexible quantization presets for different deployment scenarios
- Universal Model Export - One-click export to ONNX, TensorRT, and other inference engines
- Embedded Deployment Analysis - Comprehensive compatibility checking for edge devices
- Performance Benchmarking - Detailed speed, accuracy, and memory usage analysis
- End-to-End Pipeline - Unified workflow from fine-tuning to quantization and deployment
- Modular & Configurable - Clean separation with Hydra-based configuration management
- Production Monitoring - Built-in validation, debugging, and performance tracking
- Smart Deployment Analysis - Automatic compatibility checking for Raspberry Pi, Jetson, and other edge devices
- Quantization Health Checks - Comprehensive validation to ensure quantization effectiveness
- Performance Trade-off Analysis - Clear insights into accuracy vs. speed vs. size trade-offs
- One-Click Workflows - From trained model to deployed artifact in a single command
torch>=1.8.0
torchvision>=0.9.0
hydra-core>=1.3.0
omegaconf>=2.3.0
tqdm>=4.64.0
torch==2.0.1
torchvision==0.15.2
hydra-core==1.3.2
omegaconf==2.3.0
tqdm==4.65.0
numpy==1.24.3
Pillow==9.5.0
tensorboard==2.13.0
matplotlib==3.7.1
- Improve save_model (by timestamp).
- Implement logging.
- Implement Exception Handling.
- Validate Hydra Config.
- Implement various model support.
- Implement various dataset support.
- Implement more benchmark to evaluate model.
- Compile the model.
- Implement device controller.
- Implement Model Pruning.
- Implement Model Distilling.
- Introduce the framework like AutoML.
- Make more quantization presets.
- Check cached processing results in load_dataset.
- Implement dataloader parallel.
- Check model status in each process.
- Beautify terminal outputs.
MobileNet
├── analysis
│ ├── benchmark.py
│ ├── complexity.py
│ ├── deployment.py
│ ├── __init__.py
│ ├── quantization_test.py
│ └── utils.py
├── dataloader.py
├── Dataset
├── debug.py
├── enumeration.py
├── environment.yml
├── export
│ ├── decorator.py
│ ├── engine.py
│ └── __init__.py
├── hydra_configs
│ ├── config.yaml
│ ├── data
│ │ └── cifar10.yaml
│ ├── evaluation
│ │ └── default.yaml
│ ├── export
│ │ ├── common.yaml
│ │ └── engine
│ │ └── onnx.yaml
│ ├── model
│ │ └── mobilenet.yaml
│ ├── quantization
│ │ └── fx_graph.yaml
│ ├── run
│ │ ├── debug.yaml
│ │ └── default.yaml
│ └── training
│ ├── default.yaml
│ ├── optimizer
│ │ └── adam.yaml
│ └── scheduler
│ └── cosine.yaml
├── LICENSE
├── logs
├── main.py
├── model
│ ├── modify.py
│ ├── train.py
│ └── utils.py
├── outputs
├── quantization
│ ├── __init__.py
│ ├── qconfig_presets.py
│ ├── quantize_functions.py
│ └── quantize.py
├── README.md
├── requirements_dev.txt
├── requirements.txt
├── saved_models
└── schemas
├── context_classes.py
├── data_classes.py
├── functions.py
├── __init__.py
├── presets.py
└── validation.py
21 directories, 43 files