Olive-ai 0.5.1
Examples
The following examples are added
Passes (optimization techniques)
- QNNPreprocess: Add the configs which are added in onnxruntime nightly package.
- GptqQuantizer: PTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
- OnnxMatMul4Quantizer: Add matmul RTN/HQQ/GPTQ quant configs.
- Move all pass need create inference session to run on target:
- IncQuantization
- OptimumMerging
- OrtTransformersOptimization
- VitisAIQuantization
- OrtPerfTuning
Engine
- Support to pack AzureML output.
- Remove execution_providers from engine config, typical config looks like:
"systems": {
"local_system": {
"type": "LocalSystem",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
]
}
}
},
"engine": {
"host": "local_system",
"target": "local_system",
}
Workflows
- Delayed python pass module loading and provide the option
--package-config
to let advanced users to write their individual pass module and corresponding dependencies.
Fix
- Cannot load MLFlow model as
from_pretrained_args
is missed. - LoRA: Provide save_embedding_layers=False to saving the peft model. Otherwise, it defaults to "auto" which checks if the vocab size changed.
- Update the model_rank file for zipfile packaging type. The model path now is the path relative to the output zip file.
- Fix windows shutil.which return None when passing full python path.