Skip to content

models Local documentation

github-actions[bot] edited this page May 14, 2025 · 5 revisions

Local

Models in this category


  • deepseek-r1-distill-llama-8b

    This model is an optimized version of DeepSeek-R1-Distill-Llama-8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...

  • deepseek-r1-distill-llama-8b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R...

  • deepseek-r1-distill-llama-8b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dis...

  • deepseek-r1-distill-llama-8b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dis...

  • deepseek-r1-distill-qwen-1.5b

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...

  • deepseek-r1-distill-qwen-1.5b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-...

  • deepseek-r1-distill-qwen-1.5b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Di...

  • deepseek-r1-distill-qwen-1.5b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Di...

  • deepseek-r1-distill-qwen-1.5b-qnn-npu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of th...

  • deepseek-r1-distill-qwen-14b

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...

  • deepseek-r1-distill-qwen-14b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R...

  • deepseek-r1-distill-qwen-14b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dis...

  • deepseek-r1-distill-qwen-14b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dis...

  • deepseek-r1-distill-qwen-14b-qnn-npu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the...

  • deepseek-r1-distill-qwen-7b

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...

  • deepseek-r1-distill-qwen-7b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1...

  • deepseek-r1-distill-qwen-7b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dist...

  • deepseek-r1-distill-qwen-7b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dist...

  • deepseek-r1-distill-qwen-7b-qnn-npu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the ...

  • mistralai-Mistral-7B-Instruct-v0-2-cuda-gpu

    This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Mistral...

  • mistralai-Mistral-7B-Instruct-v0-2-generic-cpu

    This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: apache-2.0

  • License: MIT

  • Model Description: This is a conversion of the Mistral-7B-In...

  • mistralai-Mistral-7B-Instruct-v0-2-generic-gpu

    This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Mistral-7B-I...

  • Phi-3-mini-128k-instruct-cuda-gpu

    This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-128...

  • Phi-3-mini-128k-instruct-generic-cpu

    This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...

  • Phi-3-mini-128k-instruct-generic-gpu

    This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...

  • Phi-3-mini-4k-instruct-cuda-gpu

    This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-4K-In...

  • Phi-3-mini-4k-instruct-generic-cpu

    This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...

  • Phi-3-mini-4k-instruct-generic-gpu

    This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...

  • Phi-3.5-mini-instruct-cuda-gpu

    This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3.5-mini-inst...

  • Phi-3.5-mini-instruct-generic-cpu

    This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3.5-mini-instruct ...

  • Phi-3.5-mini-instruct-generic-gpu

    This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-3.5-mini-instruct ...

  • Phi-4-cuda-gpu

    This model is an optimized version of Phi-4 to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4 for local inference on CUDA...

  • Phi-4-generic-cpu

    This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4 for local inference on CPUs.

  • *...

  • Phi-4-generic-gpu

    This model is an optimized version of Phi-4 to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4 for local inference on GPUs.

  • *...

  • Phi-4-mini-instruct-cuda-gpu

    This model is an optimized version of Phi-4-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-instruct...

  • Phi-4-mini-instruct-generic-cpu

    This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-instruct for ...

  • Phi-4-mini-instruct-generic-gpu

    This model is an optimized version of Phi-4-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-instruct for ...

  • Phi-4-mini-reasoning-cuda-gpu

    This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-reasoni...

  • Phi-4-mini-reasoning-generic-cpu

    This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-reasoning fo...

  • Phi-4-mini-reasoning-generic-gpu

    This model is an optimized version of Phi-4-mini-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-mini-reasoning fo...

  • Phi-4-mini-reasoning-qnn-npu

    This model is an optimized version of Phi-4-mini-reasoning to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-m...

  • Phi-4-reasoning-cuda-gpu

    This model is an optimized version of Phi-4-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-reasoning for loc...

  • Phi-4-reasoning-generic-cpu

    This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-reasoning for local in...

  • Phi-4-reasoning-generic-gpu

    This model is an optimized version of Phi-4-reasoning to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the Phi-4-reasoning for local in...

  • qwen2.5-0.5b-instruct

    This model is an optimized version of Qwen2.5-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...

  • qwen2.5-0.5b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-0....

  • qwen2.5-0.5b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-0.5B-In...

  • qwen2.5-0.5b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-0.5B-In...

  • qwen2.5-1.5b-instruct

    This model is an optimized version of Qwen2.5-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...

  • qwen2.5-1.5b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-1....

  • qwen2.5-1.5b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-1.5B-In...

  • qwen2.5-1.5b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-1.5B-In...

  • qwen2.5-14b-instruct

    This model is an optimized version of Qwen2.5-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...

  • qwen2.5-14b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-14B...

  • qwen2.5-14b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-14B-Inst...

  • qwen2.5-14b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-14B-Inst...

  • qwen2.5-3b-instruct

    This model is an optimized version of Qwen2.5-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...

  • qwen2.5-3b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-3B-I...

  • qwen2.5-3b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-3B-Instru...

  • qwen2.5-3b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-3B-Instru...

  • qwen2.5-7b-instruct

    This model is an optimized version of Qwen2.5-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...

  • qwen2.5-7b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-7B-I...

  • qwen2.5-7b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-7B-Instru...

  • qwen2.5-7b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-7B-Instru...

  • qwen2.5-coder-0.5b-instruct

    This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...

  • qwen2.5-coder-0.5b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen...

  • qwen2.5-coder-0.5b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-C...

  • qwen2.5-coder-0.5b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-C...

  • qwen2.5-coder-1.5b-instruct

    This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...

  • qwen2.5-coder-1.5b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen...

  • qwen2.5-coder-1.5b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-C...

  • qwen2.5-coder-1.5b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-C...

  • qwen2.5-coder-14b-instruct

    This model is an optimized version of Qwen2.5-Coder-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of th...

  • qwen2.5-coder-14b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2...

  • qwen2.5-coder-14b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-Co...

  • qwen2.5-coder-14b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-Co...

  • qwen2.5-coder-3b-instruct

    This model is an optimized version of Qwen2.5-Coder-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...

  • qwen2.5-coder-3b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2....

  • qwen2.5-coder-3b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-Cod...

  • qwen2.5-coder-3b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-Cod...

  • qwen2.5-coder-7b-instruct

    This model is an optimized version of Qwen2.5-Coder-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...

  • qwen2.5-coder-7b-instruct-cuda-gpu

    This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2....

  • qwen2.5-coder-7b-instruct-generic-cpu

    This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: apache-2.0

  • Model Description: This is a conversion of the Qwen2.5-Cod...

  • qwen2.5-coder-7b-instruct-generic-gpu

    This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft
  • Model type: ONNX
  • License: apache-2.0
  • Model Description: This is a conversion of the Qwen2.5-Cod...
Clone this wiki locally