You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Summary
- Requires: vllm-project/compressed-tensors#509
- Add script to generate an mxfp4 quantized model
- This feature is currently experimental as support has not landed or
tested in vLLM
# Testing:
Sample Model:
- nm-testing/Meta-Llama-3-8B-Instruct-MXFP4
Sample Generation (Transformers):
```bash
========== SAMPLE GENERATION ==============
<|begin_of_text|>Hello my name is Sophia and I am a 3rd year student at the University of California, Berkeley. I am a double major in Linguistics and Psychology, with a minor in Education. I am very interested in the way that language and culture interact, and I believe that education is the key to creating a more just and equitable society.
I am a native speaker of English, and I have also studied Spanish, French, and Mandarin Chinese. I am very interested in the way that language can be used to bring
==========================================
```
Sample Config:
```yaml
"quantization_config": {
"config_groups": {
"group_0": {
"format": "mxfp4-pack-quantized",
"input_activations": {
"actorder": null,
"block_structure": null,
"dynamic": true,
"group_size": 32,
"num_bits": 4,
"observer": null,
"observer_kwargs": {},
"scale_dtype": "torch.uint8",
"strategy": "group",
"symmetric": true,
"type": "float",
"zp_dtype": null
},
"output_activations": null,
"targets": [
"Linear"
],
"weights": {
"actorder": null,
"block_structure": null,
"dynamic": false,
"group_size": 32,
"num_bits": 4,
"observer": "minmax",
"observer_kwargs": {},
"scale_dtype": "torch.uint8",
"strategy": "group",
"symmetric": true,
"type": "float",
"zp_dtype": null
}
}
},
"format": "mxfp4-pack-quantized",
}
```
---------
Signed-off-by: Dipika Sikka <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This folder aims to highlight features that are a work-in-progress or are supported in LLM Compressor and/or Compressed-Tensors but lack full support in downstream libraries like vLLM.
0 commit comments