-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ-for-Llama broken on AMD #3754
Comments
The rentry instructions are severely outdated and a GPTQ-for-LLaMa wheel is currently only included for compatibility with older NVIDIA GPUs. If AutoGPTQ works for AMD, it should be preferred. I don't know much about AMD, but I have created and pinned an issue where hopefully people can share setup information: #3759 |
Thanks for the feedback. I tried AutoGPTQ and it seems to work. However if I install the wheel from https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl it is much slower than GPTQ-for-LLaMa (There is a Warning that ExLLaMa is missing) When I build it from source it is as fast as expected but the output is gibberish again. I never had this issue. Could this still be a problem with my GPTQ setup or could this be an unrelated problem? Thanks for creating the thread I think this will be very helpful. |
Gibberish output is usually a sign of using a model with |
I don't think that causes the problem. I used the main version of https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ and in the documentation it says As Triton is not currently supported on AMD (as far as I know), I am not able to test it with the |
It is getting worse xD [1] 58417 segmentation fault (core dumped) python server.py |
Was able to solve the problem by reinstalling everything (including a complete reinstall of ROCm). |
Describe the bug
The update of the requirements.txt and import of
gptq_for_llama
in theGPTQ_loader
module seems to break AMD installation.When running the installation as described in the README.md the GPTQ-for-Llama test fails:
The reason seems to be line
53
in the requirements.txtWhen removing this line the GPTQ-for-Llama test works but loading the model fails because of the reworked imports in
GPTQ_loader
.When reverting it to the old import it works again.
Is there an existing issue for this?
Reproduction
Install the text-generation-webui on a AMD device as described in the README.md with the ROCm Installation from https://rentry.org/eq3hg
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: