GitHub · Where software is built

Optimize functionality using torch.compile
#1485 · dsikka opened on May 28, 2025
2
FAQ: Check out the FAQ Page
#1945 · dsikka opened on Oct 17, 2025
Add Additional Model Mappings for AWQ and SmoothQuant
#1442 · dsikka opened on May 16, 2025
6

Labels Milestones New issue

[Bug]: High Host Memory Usage During GPTQ Quantization of Qwen3-14B (W4A16)

#2039

· tghfly opened

on Nov 17, 2025

HOW To: Quantization: Qwen/Qwen3-VL-235B-A22B-Instruct AWQ

#2038

· tianruochen opened

on Nov 17, 2025

[upstream] Expecting future huggingface/transformer incompatibility

#2036

· mratsim opened

on Nov 14, 2025

[Bug]: SequentialPipeline fails on ERNIE-4.5-VL (remote code) with FX trace TypeError: to(device=MetaDeviceAttribute)

#2033

· Firworksyt opened

on Nov 14, 2025

[RFC]: KL-divergence evaluation tool

#2031

· mratsim opened

on Nov 13, 2025

I want to know how can I quant a model to fp8,which is hybrid model, like deepseekv2 but owning llama attention

#2026

· youngze0016 opened

on Nov 12, 2025

[Performance]: Qwen3-VL-8B.w4a16 no better in vllm throughput than original

#2025

· LouisAI-DL opened

on Nov 12, 2025

[Bug]: Qwen3 example works on transformers v4.57.1 but fails on transformers main

#2022

· brian-dellabetta opened

on Nov 11, 2025

[Bug]: InternVL3-8B-hf gptq quantize failed

#2019

· BigFaceBoy opened

on Nov 11, 2025

FP8 FLOCK Quantization with Block Size 64 Causes Qwen3-235B MoE Model Inference Failure in vLLM

#2010

· wangwenmingaa opened

on Nov 10, 2025

[Help Wanted] Tokenzier warning messages

good first issue

#2007

· kylesayrs opened

on Nov 10, 2025

[RFC]: MR-GPTQ (GPTQ+NVFP4)

#2006

· mratsim opened

on Nov 9, 2025