-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Please describe the feature you want
Candle-vllm, mistralrs, or candle-based primitives for model handling should provider tighter coupling and possibly better performance. At present, candle-vllm
can muster ~55T/s on a q8_0
Qwen3-Coder even on NVCC7 hardware (Volta generation) with a 512k context (fairly stable into ~400k range due to how it handles ISQ and attention) whereas llamacpp gets a fraction of that and seems to forget what it was doing earlier into large context windows.
Additional context
Add any other context or screenshots about the feature request here.
Please reply with a 👍 if you want this feature.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request