You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default parallel mode of Transformers is pipeline parallelism, which is obviously slower than tensor parallelism. As far as I know, transformers currently support tensor parallelism, so I hope this framework can also support tensor parallelism to make quantization faster.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi @Arcmoon-Hu , as far as I know tensor parallelism is only a concern at inference and is supported in vllm -- see docs here -- but it is not generally used/needed during compression.
The default parallel mode of Transformers is pipeline parallelism, which is obviously slower than tensor parallelism. As far as I know, transformers currently support tensor parallelism, so I hope this framework can also support tensor parallelism to make quantization faster.
Thanks!
The text was updated successfully, but these errors were encountered: