ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

amd-vivekag · 2025-02-13T10:07:05Z

Steps to reproduce:

python run.py -t <testname> -b llvm-cpu -d local-task -c x86_64-linux-gnu  --mode=cl-onnx-iree --cleanup=3 --get-metadata -v

Tests failing:

hf_Midnight-Miqu-70B-v1.5-4bit
hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4

Should get following ImportError:

ImportError("Loading an AWQ quantized model requires auto-awq library (`pip install autoawq`)

The error changes to following if the package is installed:
RuntimeError: GPU is required to run AWQ quantized model. You can use IPEX version AWQ if you have an Intel CPU

The text was updated successfully, but these errors were encountered:

amd-vivekag · 2025-03-03T06:38:59Z

As per code, we don't have a way to run this testcase for AMD CPU or GPU. Need to discuss further on how to handle these cases:

69         else:
70             if not torch.cuda.is_available():
71                 raise RuntimeError(
72                     "GPU is required to run AWQ quantized model. You can use IPEX version AWQ if you have an Intel CPU"
73                 )

After installing pytorch library with ROCM support, at least it was able to start downloading and loading shards and other files. Now it is failing at following error (while generating IR it seems):

  File "venv.env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 21:16:
    group_size,
    BLOCK_SIZE_M: tl.constexpr,
    BLOCK_SIZE_N: tl.constexpr,
    BLOCK_SIZE_K: tl.constexpr,
    SPLIT_K: tl.constexpr,
):
    pid = tl.program_id(axis=0)
    pid_z = tl.program_id(1)

    # NOTE: This doesn't work in TRITON_INTERPRET=1 mode.  Use below instead.
    # num_pid_n = (N + BLOCK_SIZE_N - 1) // BLOCK_SIZE_N
    num_pid_n = tl.cdiv(N, BLOCK_SIZE_N)
                ^

triton.compiler.errors.CompilationError: at 10:11:
def cdiv(x, div):
    """
    Computes the ceiling division of :code:`x` by :code:`div`

    :param x: the input number
    :type x: Block
    :param div: the divisor
    :type div: Block
    """
    return (x + div - 1) // div
           ^
IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')

amd-vivekag · 2025-03-03T16:42:14Z

@zjgarvey can you please suggest what should be the next step for this issue?

amd-vivekag mentioned this issue Feb 13, 2025

HF model tracker #899

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

amd-vivekag commented Feb 13, 2025 •

edited

Loading

amd-vivekag commented Mar 3, 2025 •

edited

Loading

Uh oh!

amd-vivekag commented Mar 3, 2025

Uh oh!

ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

Comments

amd-vivekag commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amd-vivekag commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amd-vivekag commented Mar 3, 2025

Uh oh!

amd-vivekag commented Feb 13, 2025 •

edited

Loading

amd-vivekag commented Mar 3, 2025 •

edited

Loading