You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
When I quant a 200B model on 8xA100, it really slowly on entire process which spend 24 hours
Describe the solution you'd like
A clear and concise description of what you want to happen.
if there is any way quant model on multi nodes, it will be accelerate
The text was updated successfully, but these errors were encountered:
Hi @shuxiaobo , most compression algorithms have to run layer-by-layer, so that quantization error can be taken into account in the following layers. We have discussed a data-parallel quantization, to leverage several GPUs for each layer, which is highly desired but will require some time and thought to implement. So at the moment it is not being actively developed.
Which recipes are you trying with the 200B parameter model?
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
When I quant a 200B model on 8xA100, it really slowly on entire process which spend 24 hours
Describe the solution you'd like
A clear and concise description of what you want to happen.
if there is any way quant model on multi nodes, it will be accelerate
The text was updated successfully, but these errors were encountered: