Skip to content

Is there any way quant model on multi nodes? #1382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shuxiaobo opened this issue Apr 25, 2025 · 1 comment
Open

Is there any way quant model on multi nodes? #1382

shuxiaobo opened this issue Apr 25, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@shuxiaobo
Copy link

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
When I quant a 200B model on 8xA100, it really slowly on entire process which spend 24 hours

Describe the solution you'd like
A clear and concise description of what you want to happen.
if there is any way quant model on multi nodes, it will be accelerate

@shuxiaobo shuxiaobo added the enhancement New feature or request label Apr 25, 2025
@brian-dellabetta
Copy link
Collaborator

Hi @shuxiaobo , most compression algorithms have to run layer-by-layer, so that quantization error can be taken into account in the following layers. We have discussed a data-parallel quantization, to leverage several GPUs for each layer, which is highly desired but will require some time and thought to implement. So at the moment it is not being actively developed.

Which recipes are you trying with the 200B parameter model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants