-
Notifications
You must be signed in to change notification settings - Fork 146
models Local documentation
-
This model is an optimized version of DeepSeek-R1-Distill-Llama-8B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
-
deepseek-r1-distill-llama-8b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R...
-
deepseek-r1-distill-llama-8b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dis...
-
deepseek-r1-distill-llama-8b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Llama-8B to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dis...
-
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...
-
deepseek-r1-distill-qwen-1.5b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-...
-
deepseek-r1-distill-qwen-1.5b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Di...
-
deepseek-r1-distill-qwen-1.5b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Di...
-
deepseek-r1-distill-qwen-1.5b-qnn-npu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of th...
-
This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of ...
-
deepseek-r1-distill-qwen-14b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R...
-
deepseek-r1-distill-qwen-14b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dis...
-
deepseek-r1-distill-qwen-14b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dis...
-
deepseek-r1-distill-qwen-14b-qnn-npu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-14B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the...
-
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...
-
deepseek-r1-distill-qwen-7b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1...
-
deepseek-r1-distill-qwen-7b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dist...
-
deepseek-r1-distill-qwen-7b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dist...
-
deepseek-r1-distill-qwen-7b-qnn-npu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the ...
-
mistralai-Mistral-7B-Instruct-v0-2-cuda-gpu
This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Mistral...
-
mistralai-Mistral-7B-Instruct-v0-2-generic-cpu
This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: apache-2.0
-
License: MIT
-
Model Description: This is a conversion of the Mistral-7B-In...
-
mistralai-Mistral-7B-Instruct-v0-2-generic-gpu
This model is an optimized version of Mistral-7B-Instruct-v0.2 to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Mistral-7B-I...
-
Phi-3-mini-128k-instruct-cuda-gpu
This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-128...
-
Phi-3-mini-128k-instruct-generic-cpu
This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
-
Phi-3-mini-128k-instruct-generic-gpu
This model is an optimized version of Phi-3-Mini-128K-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-128K-Ins...
-
Phi-3-mini-4k-instruct-cuda-gpu
This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-4K-In...
-
Phi-3-mini-4k-instruct-generic-cpu
This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
-
Phi-3-mini-4k-instruct-generic-gpu
This model is an optimized version of Phi-3-Mini-4K-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3-Mini-4K-Instruc...
-
Phi-3.5-mini-instruct-cuda-gpu
This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3.5-mini-inst...
-
Phi-3.5-mini-instruct-generic-cpu
This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
-
Phi-3.5-mini-instruct-generic-gpu
This model is an optimized version of Phi-3.5-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-3.5-mini-instruct ...
-
This model is an optimized version of Phi-4 to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4 for local inference on CUDA...
-
This model is an optimized version of Phi-4 to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4 for local inference on CPUs.
-
*...
-
This model is an optimized version of Phi-4 to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4 for local inference on GPUs.
-
*...
-
This model is an optimized version of Phi-4-mini-instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-instruct...
-
Phi-4-mini-instruct-generic-cpu
This model is an optimized version of Phi-4-mini-instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
-
Phi-4-mini-instruct-generic-gpu
This model is an optimized version of Phi-4-mini-instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-instruct for ...
-
This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-reasoni...
-
Phi-4-mini-reasoning-generic-cpu
This model is an optimized version of Phi-4-mini-reasoning to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
-
Phi-4-mini-reasoning-generic-gpu
This model is an optimized version of Phi-4-mini-reasoning to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-mini-reasoning fo...
-
This model is an optimized version of Phi-4-mini-reasoning to enable local inference on QNN NPUs. This model uses QuaRot and GPTQ quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-m...
-
This model is an optimized version of Phi-4-reasoning to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-reasoning for loc...
-
This model is an optimized version of Phi-4-reasoning to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-reasoning for local in...
-
This model is an optimized version of Phi-4-reasoning to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the Phi-4-reasoning for local in...
-
This model is an optimized version of Qwen2.5-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
-
qwen2.5-0.5b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-0....
-
qwen2.5-0.5b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
-
qwen2.5-0.5b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-0.5B-In...
-
This model is an optimized version of Qwen2.5-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these t...
-
qwen2.5-1.5b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-1....
-
qwen2.5-1.5b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
-
qwen2.5-1.5b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-1.5B-In...
-
This model is an optimized version of Qwen2.5-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these ta...
-
This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-14B...
-
qwen2.5-14b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
-
qwen2.5-14b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-14B-Inst...
-
This model is an optimized version of Qwen2.5-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
-
This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-3B-I...
-
qwen2.5-3b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
-
qwen2.5-3b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-3B-Instru...
-
This model is an optimized version of Qwen2.5-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these tar...
-
This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-7B-I...
-
qwen2.5-7b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
-
qwen2.5-7b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-7B-Instru...
-
This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
-
qwen2.5-coder-0.5b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen...
-
qwen2.5-coder-0.5b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-C...
-
qwen2.5-coder-0.5b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-Coder-0.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-C...
-
This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of t...
-
qwen2.5-coder-1.5b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen...
-
qwen2.5-coder-1.5b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-C...
-
qwen2.5-coder-1.5b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-Coder-1.5B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-C...
-
This model is an optimized version of Qwen2.5-Coder-14B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of th...
-
qwen2.5-coder-14b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2...
-
qwen2.5-coder-14b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-Co...
-
qwen2.5-coder-14b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-Coder-14B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-Co...
-
This model is an optimized version of Qwen2.5-Coder-3B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
-
qwen2.5-coder-3b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2....
-
qwen2.5-coder-3b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-Cod...
-
qwen2.5-coder-3b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-Coder-3B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-Cod...
-
This model is an optimized version of Qwen2.5-Coder-7B-Instruct for local inference. Optimized models are published here in ONNX format to run on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of the...
-
qwen2.5-coder-7b-instruct-cuda-gpu
This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2....
-
qwen2.5-coder-7b-instruct-generic-cpu
This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: apache-2.0
-
Model Description: This is a conversion of the Qwen2.5-Cod...
-
qwen2.5-coder-7b-instruct-generic-gpu
This model is an optimized version of Qwen2.5-Coder-7B-Instruct to enable local inference on GPUs. This model uses RTN quantization.
- Developed by: Microsoft
- Model type: ONNX
- License: apache-2.0
- Model Description: This is a conversion of the Qwen2.5-Cod...