[Feature][Config] Add quantized Llama 405B inference + job config #1391

wizeng23 · 2025-02-06T04:06:08Z

Feature request

Possible quantized model to use: https://huggingface.co/neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8
May be able to fit on 8x A100 (40 or 80GB) on GCP.

Motivation / references

This is one of the best-performing open-weight models, and should be possible to host locally when quantized. Possible use cases include being an LLM judge.

Your contribution

N/A

devampatel03 · 2025-02-18T20:37:34Z

Hi @wizeng23 ! I would like to work on this. Can you assign this issue to me?

wizeng23 · 2025-02-19T00:31:47Z

Done! Appreciate the help with this :) please let me know if you have any questions about this!

wizeng23 added enhancement New feature or request Feature good first issue Good for newcomers triage This issue needs review by the core team. labels Feb 6, 2025

wizeng23 self-assigned this Feb 6, 2025

wizeng23 changed the title ~~[Feature] Add quantized Llama 405B inference + job config~~ [Feature][Config] Add quantized Llama 405B inference + job config Feb 7, 2025

wizeng23 removed their assignment Feb 7, 2025

taenin removed the triage This issue needs review by the core team. label Feb 12, 2025

wizeng23 assigned devampatel03 Feb 19, 2025

devampatel03 linked a pull request Feb 24, 2025 that will close this issue

Add Quantized Llama 405B Inference + Job Config #1477

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Config] Add quantized Llama 405B inference + job config #1391

[Feature][Config] Add quantized Llama 405B inference + job config #1391

wizeng23 commented Feb 6, 2025

devampatel03 commented Feb 18, 2025

wizeng23 commented Feb 19, 2025

[Feature][Config] Add quantized Llama 405B inference + job config #1391

[Feature][Config] Add quantized Llama 405B inference + job config #1391

Comments

wizeng23 commented Feb 6, 2025

Feature request

Motivation / references

Your contribution

devampatel03 commented Feb 18, 2025

wizeng23 commented Feb 19, 2025