You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However every sample memory usage grew by 1~5 GB leading in the end to over 900GB before I decided to give up on AWQ. Even with a swapfile, the time was spent in kernel swap in/out and IO and compute was slow to then frustratingly crash on Cuda OOM once that CPU part was solved.
Screenshot:
Side-note: couldn't the calibration be made multi-threaded?
The text was updated successfully, but these errors were encountered:
I tried to quantized GLM-4-0414-32B: https://huggingface.co/THUDM/GLM-4-32B-0414
Recipe:
I tried using 128 samples as suggested in those slides ("Calibration set"): https://minjiazhang.github.io/courses/fall24-resource/slides/awq.pdf
However every sample memory usage grew by 1~5 GB leading in the end to over 900GB before I decided to give up on AWQ. Even with a swapfile, the time was spent in kernel swap in/out and IO and compute was slow to then frustratingly crash on Cuda OOM once that CPU part was solved.
Screenshot:

Side-note: couldn't the calibration be made multi-threaded?
The text was updated successfully, but these errors were encountered: