models Local documentation

Local

Models in this category

deepseek-r1-distill-llama-8b

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
deepseek-r1-distill-llama-8b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R...
deepseek-r1-distill-llama-8b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-llama-8b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-qwen-1.5b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...
deepseek-r1-distill-qwen-1.5b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-...
deepseek-r1-distill-qwen-1.5b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Di...
deepseek-r1-distill-qwen-1.5b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Di...
DeepSeek-R1-Distill-Qwen-1.5B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for local in...
deepseek-r1-distill-qwen-1.5b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of th...
DeepSeek-R1-Distill-Qwen-1.5B-trtrtx-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-1.5B for l...
deepseek-r1-distill-qwen-14b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
deepseek-r1-distill-qwen-14b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R...
deepseek-r1-distill-qwen-14b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
deepseek-r1-distill-qwen-14b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dis...
DeepSeek-R1-Distill-Qwen-14B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-14B for local infe...
deepseek-r1-distill-qwen-14b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the...
deepseek-r1-distill-qwen-14b-trtrtx-gpu

This model is an optimized version of deepseek-r1-distill-qwen-14b to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the deepseek-r1-distill-qwen-14b for loc...
deepseek-r1-distill-qwen-7b

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...
deepseek-r1-distill-qwen-7b-cuda-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1...
deepseek-r1-distill-qwen-7b-generic-cpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dist...
deepseek-r1-distill-qwen-7b-generic-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Dist...
DeepSeek-R1-Distill-Qwen-7B-openvino-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local infere...
DeepSeek-R1-Distill-Qwen-7B-openvino-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local infere...
deepseek-r1-distill-qwen-7b-qnn-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the ...
DeepSeek-R1-Distill-Qwen-7B-trtrtx-gpu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local...
DeepSeek-R1-Distill-Qwen-7B-vitis-npu

This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the DeepSeek-R1-Distill-Qwen-7B for local inferenc...
gpt-oss-20b-generic-cpu

This model is an optimized version of gpt-oss-20b to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: Apache-2.0
License Description: Use of this model is subject to the terms of the Ap...
Mistral-7B-Instruct-v0-2-openvino-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local infer...
Mistral-7B-Instruct-v0-2-openvino-npu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local infer...
Mistral-7B-Instruct-v0-2-vitis-npu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for local inferen...
mistralai-Mistral-7B-Instruct-v0-2-cuda-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral...
mistralai-Mistral-7B-Instruct-v0-2-generic-cpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: apache-2.0
License: MIT
Model Description: This is a conversion of the Mistral-7B-In...
mistralai-Mistral-7B-Instruct-v0-2-generic-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-I...
mistralai-Mistral-7B-Instruct-v0-2-trtrtx-gpu

This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Mistral-7B-Instruct-v0.2 for loc...
openai-whisper-base

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original...

openai-whisper-base-cuda-gpu

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-base-generic-cpu

This model is an optimized version of Whisper Base for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https://...

openai-whisper-large-v3-turbo

This model is an optimized version of Whisper Large V3 Turbo for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the...

openai-whisper-large-v3-turbo-cuda-gpu

Whisper Large V3 Turbo is an advanced speech recognition model, optimized for high-performance GPU inference. It is suitable for automatic speech recognition (ASR) tasks in various domains, leveraging large-scale training data for robust multilingual transcription. This model is an optimized vers...
openai-whisper-large-v3-turbo-generic-cpu

Whisper Large V3 Turbo is an advanced speech recognition model, optimized for high-performance CPU inference. It is suitable for automatic speech recognition (ASR) tasks in various domains, leveraging large-scale training data for robust multilingual transcription. This model is designed for scen...
openai-whisper-medium

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [origin...

openai-whisper-medium-cuda-gpu

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https...

openai-whisper-medium-generic-cpu

This model is an optimized version of Whisper Medium for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:...

openai-whisper-small

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [origina...

openai-whisper-small-cuda-gpu

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:...

openai-whisper-small-generic-cpu

This model is an optimized version of Whisper Small for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-tiny

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original...

openai-whisper-tiny-cuda-gpu

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CUDA devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https:/...

openai-whisper-tiny-generic-cpu

This model is an optimized version of Whisper Tiny for local inference. Optimized models are published here in ONNX format to run on CPU devices, including server platforms, desktops, and mobile, with the precision best suited to each of these targets.

Review the [original model card](https://...

Phi-3-mini-128k-instruct-cuda-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128...
Phi-3-mini-128k-instruct-generic-cpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
Phi-3-mini-128k-instruct-generic-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
Phi-3-mini-128k-instruct-openvino-gpu

This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-128K-Instruct for local inference on...
phi-3-mini-128k-instruct-qnn-npu

This model is an optimized version of phi-3-mini-128k-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-128k-instruct for local inference on Q...
phi-3-mini-128k-instruct-trtrtx-gpu

This model is an optimized version of phi-3-mini-128k-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-128k-instruct for local infer...
phi-3-mini-128k-instruct-vitis-npu

This model is an optimized version of Phi-3-mini-128k-instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-mini-128k...
Phi-3-mini-4k-instruct-cuda-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-In...
Phi-3-mini-4k-instruct-generic-cpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
Phi-3-mini-4k-instruct-generic-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
Phi-3-mini-4k-instruct-openvino-gpu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on Int...
Phi-3-mini-4k-instruct-openvino-npu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruct for local inference on Int...
phi-3-mini-4k-instruct-qnn-npu

This model is an optimized version of phi-3-mini-4k-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-4k-instruct for local inference on QNN N...
phi-3-mini-4k-instruct-trtrtx-gpu

This model is an optimized version of phi-3-mini-4k-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3-mini-4k-instruct for local inference...
Phi-3-mini-4k-instruct-vitis-npu

This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3-Mini-4K-Ins...
Phi-3.5-mini-instruct-cuda-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-inst...
Phi-3.5-mini-instruct-generic-cpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
Phi-3.5-mini-instruct-generic-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
Phi-3.5-mini-instruct-openvino-gpu

This model is an optimized version of Phi-3.5-Mini-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-Mini-Instruct for local inference on Intel...
phi-3.5-mini-instruct-qnn-npu

This model is an optimized version of phi-3.5-mini-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the phi-3.5-mini-instruct for local inference on QNN NPU...
phi-3.5-mini-instruct-trtrtx-gpu

This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-3.5-mini-instruct for local inference o...
Phi-4-cuda-gpu

This model is an optimized version of Phi-4 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on CUDA...
Phi-4-generic-cpu

This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on CPUs.
*...
Phi-4-generic-gpu

This model is an optimized version of Phi-4 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on GPUs.
*...
Phi-4-mini-instruct-cuda-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct...
Phi-4-mini-instruct-generic-cpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
Phi-4-mini-instruct-generic-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
phi-4-mini-instruct-openvino-gpu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for local inference on Intel GPU...
phi-4-mini-instruct-openvino-npu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct for local inference on Intel NPU...
phi-4-mini-instruct-vitis-npu

This model is an optimized version of Phi-4-mini-instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-instruct ...
Phi-4-mini-reasoning-cuda-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoni...
Phi-4-mini-reasoning-generic-cpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
Phi-4-mini-reasoning-generic-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
Phi-4-mini-reasoning-openvino-gpu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on Intel GP...
Phi-4-mini-reasoning-openvino-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on Intel NP...
Phi-4-mini-reasoning-qnn-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-m...
Phi-4-mini-reasoning-vitis-npu

This model is an optimized version of Phi-4-mini-reasoning to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-mini-reasoning for local inference on AMD NPUs. ...
phi-4-openvino-gpu

This model is an optimized version of Phi-4 to enable local inference on Intel GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on Int...
Phi-4-reasoning-cuda-gpu

This model is an optimized version of Phi-4-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for loc...
Phi-4-reasoning-generic-cpu

This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for local in...
Phi-4-reasoning-generic-gpu

This model is an optimized version of Phi-4-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4-reasoning for local in...
Phi-4-trtrtx-gpu

This model is an optimized version of Phi-4 to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Phi-4 for local inference on TensorRT-RTX GPUs.
**Disclai...
qwen2.5-0.5b-instruct

This model is an optimized version of Qwen2.5-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
qwen2.5-0.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0....
qwen2.5-0.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
qwen2.5-0.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
qwen2.5-0.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on Intel GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0...
qwen2.5-0.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on Intel NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0...
qwen2.5-0.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qw...
qwen2.5-0.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on AMD NPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-0.5...
qwen2.5-1.5b-instruct

This model is an optimized version of Qwen2.5-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
qwen2.5-1.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1....
qwen2.5-1.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
qwen2.5-1.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
qwen2.5-1.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-Instruct for local inference o...
qwen2.5-1.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-1.5B-Instruct for local inference o...
qwen2.5-1.5b-instruct-qnn-npu

This model is an optimized version of qwen2.5-1.5b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-1.5b-instruct for local inference on QNN NPU...
qwen2.5-1.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-1.5b-instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the Qwen2.5-1.5b-instruct for local inference o...
qwen2.5-14b-instruct

This model is an optimized version of Qwen2.5-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen2.5-14b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B...
qwen2.5-14b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
qwen2.5-14b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
qwen2.5-14b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-14B-Instruct for local inference on ...
qwen2.5-14b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwe...
qwen2.5-3b-instruct

This model is an optimized version of Qwen2.5-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
qwen2.5-3b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-I...
qwen2.5-3b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
qwen2.5-3b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
qwen2.5-7b-instruct

This model is an optimized version of Qwen2.5-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
qwen2.5-7b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-I...
qwen2.5-7b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
qwen2.5-7b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
qwen2.5-7b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on In...
qwen2.5-7b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on In...
qwen2.5-7b-instruct-qnn-npu

This model is an optimized version of qwen2.5-7b-instruct to enable local inference on QNN NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: MIT
Model Description: This is a conversion of the qwen2.5-7b-instruct for local inference on QNN NPUs. -...
qwen2.5-7b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inferenc...
qwen2.5-7b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-7B-Instruct for local inference on AMD ...
qwen2.5-coder-0.5b-instruct

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
qwen2.5-coder-0.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen2.5-coder-0.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-0.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-0.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-0.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-0.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of ...
qwen2.5-coder-0.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local in...
qwen2.5-coder-1.5b-instruct

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
qwen2.5-coder-1.5b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen...
qwen2.5-coder-1.5b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-1.5b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-C...
qwen2.5-coder-1.5b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-1.5b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-0.5B-Instruct for local...
qwen2.5-coder-1.5b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on TensorRT-RTX GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of ...
qwen2.5-coder-1.5b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-1.5B-Instruct for local i...
qwen2.5-coder-14b-instruct

This model is an optimized version of Qwen2.5-Coder-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of th...
qwen2.5-coder-14b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2...
qwen2.5-coder-14b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Co...
qwen2.5-coder-14b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Co...
qwen2.5-coder-14b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-14B-Instruct for local i...
qwen2.5-coder-14b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-14B-Instruct for ...
qwen2.5-coder-3b-instruct

This model is an optimized version of Qwen2.5-Coder-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
qwen2.5-coder-3b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2....
qwen2.5-coder-3b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-3b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct

This model is an optimized version of Qwen2.5-Coder-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
qwen2.5-coder-7b-instruct-cuda-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2....
qwen2.5-coder-7b-instruct-generic-cpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct-generic-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Cod...
qwen2.5-coder-7b-instruct-openvino-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on Intel GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inf...
qwen2.5-coder-7b-instruct-openvino-npu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on Intel NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local inf...
qwen2.5-coder-7b-instruct-trtrtx-gpu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on TensorRT-RTX GPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for lo...
qwen2.5-coder-7b-instruct-vitis-npu

This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on AMD NPUs.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen2.5-Coder-7B-Instruct for local infer...
qwen3-0.6b

This model is an optimized version of Qwen3-0.6B-Finetuned for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
qwen3-0.6b-pp-finetuned-generic-cpu

This model is an optimized version of Qwen3-0.6B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

Developed by: Microsoft
Model type: ONNX
License: apache-2.0
Model Description: This is a conversion of the Qwen3-0.6B for local infer...

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing