GPTQ-for-Llama broken on AMD #3754

lufixSch · 2023-08-30T12:00:31Z

Describe the bug

The update of the requirements.txt and import of gptq_for_llama in the GPTQ_loader module seems to break AMD installation.

When running the installation as described in the README.md the GPTQ-for-Llama test fails:

$ CUDA_VISIBLE_DEVICES=0 python test_kernel.py
Traceback (most recent call last):
  File "/media/Linux DATA/AI/LLM/WebUI/repositories/GPTQ-for-LLaMa/test_kernel.py", line 4, in <module>
    import quant_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

The reason seems to be line 53 in the requirements.txt

https://github.com/jllllll/GPTQ-for-LLaMa-CUDA/releases/download/0.1.0/gptq_for_llama-0.1.0+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

When removing this line the GPTQ-for-Llama test works but loading the model fails because of the reworked imports in GPTQ_loader.
When reverting it to the old import it works again.

sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa")))

try:
    import llama_inference_offload
except ImportError:
    logger.error("Failed to load GPTQ-for-LLaMa")
    logger.error(
        "See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md"
    )
    sys.exit(-1)

try:
    from modelutils import find_layers
except ImportError:
    from utils import find_layers

try:
    from quant import make_quant

    is_triton = False
except ImportError:
    import quant

    is_triton = True

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Install the text-generation-webui on a AMD device as described in the README.md with the ROCm Installation from https://rentry.org/eq3hg

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/media/Linux DATA/AI/LLM/WebUI/repositories/GPTQ-for-LLaMa/test_kernel.py", line 4, in <module>
    import quant_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

System Info

Operating System: Manjaro Linux 
KDE Plasma Version: 5.27.7
KDE Frameworks Version: 5.109.0
Qt Version: 5.15.10
Kernel Version: 6.1.49-1-MANJARO (64-bit)
Graphics Platform: X11
Processors: 20 × 13th Gen Intel® Core™ i5-13500
Memory: 31.1 GiB of RAM
Graphics Processor: AMD Radeon RX 6750 XT
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7D98
System Version: 1.0

The text was updated successfully, but these errors were encountered:

lufixSch · 2023-08-30T12:10:18Z

Just noticed that the output of the model is now total gibberish. I am not sure if this is related or not.

oobabooga · 2023-08-30T16:55:44Z

The rentry instructions are severely outdated and a GPTQ-for-LLaMa wheel is currently only included for compatibility with older NVIDIA GPUs. If AutoGPTQ works for AMD, it should be preferred.

I don't know much about AMD, but I have created and pinned an issue where hopefully people can share setup information: #3759

lufixSch · 2023-08-30T22:36:37Z

Thanks for the feedback. I tried AutoGPTQ and it seems to work.

However if I install the wheel from https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl it is much slower than GPTQ-for-LLaMa (There is a Warning that ExLLaMa is missing)

When I build it from source it is as fast as expected but the output is gibberish again.

I never had this issue. Could this still be a problem with my GPTQ setup or could this be an unrelated problem?

Thanks for creating the thread I think this will be very helpful.

oobabooga · 2023-08-31T00:27:44Z

Gibberish output is usually a sign of using a model with desc_act=True (also called "act order") and groupsize > 0 while not checking the triton option. Last time I cheched, act order + groupsize requires triton.

lufixSch · 2023-08-31T10:41:21Z

I don't think that causes the problem. I used the main version of https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ and in the documentation it says desc_act=False.

As Triton is not currently supported on AMD (as far as I know), I am not able to test it with the triton option checked

lufixSch · 2023-09-02T10:17:16Z

It is getting worse xD
After I reinstalled ROCm, deleted my venv and installed all python dependencies again I am now unable to get results with any model.
As before I am able to load Models (GPTQ or Transformer) but as soon as I want to generate some text the whole program crashes with a segmentation fault.

[1]    58417 segmentation fault (core dumped)  python server.py

lufixSch · 2023-09-09T22:29:16Z

Was able to solve the problem by reinstalling everything (including a complete reinstall of ROCm).
I have no Idea what caused the problem but I will close this anyway. Thanks for the Help!

lufixSch added the bug Something isn't working label Aug 30, 2023

lufixSch closed this as completed Sep 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ-for-Llama broken on AMD #3754

GPTQ-for-Llama broken on AMD #3754

lufixSch commented Aug 30, 2023

lufixSch commented Aug 30, 2023

oobabooga commented Aug 30, 2023

lufixSch commented Aug 30, 2023

oobabooga commented Aug 31, 2023

lufixSch commented Aug 31, 2023

lufixSch commented Sep 2, 2023

lufixSch commented Sep 9, 2023

GPTQ-for-Llama broken on AMD #3754

GPTQ-for-Llama broken on AMD #3754

Comments

lufixSch commented Aug 30, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

lufixSch commented Aug 30, 2023

oobabooga commented Aug 30, 2023

lufixSch commented Aug 30, 2023

oobabooga commented Aug 31, 2023

lufixSch commented Aug 31, 2023

lufixSch commented Sep 2, 2023

lufixSch commented Sep 9, 2023