Skip to content

ImportError("Loading an AWQ quantized model requires auto-awq library (pip install autoawq) #918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amd-vivekag opened this issue Feb 13, 2025 · 2 comments

Comments

@amd-vivekag
Copy link

amd-vivekag commented Feb 13, 2025

Steps to reproduce:

python run.py -t <testname> -b llvm-cpu -d local-task -c x86_64-linux-gnu  --mode=cl-onnx-iree --cleanup=3 --get-metadata -v

Tests failing:

hf_Midnight-Miqu-70B-v1.5-4bit
hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4

Should get following ImportError:

ImportError("Loading an AWQ quantized model requires auto-awq library (`pip install autoawq`)

The error changes to following if the package is installed:
RuntimeError: GPU is required to run AWQ quantized model. You can use IPEX version AWQ if you have an Intel CPU

@amd-vivekag
Copy link
Author

amd-vivekag commented Mar 3, 2025

As per code, we don't have a way to run this testcase for AMD CPU or GPU. Need to discuss further on how to handle these cases:

69         else:
70             if not torch.cuda.is_available():
71                 raise RuntimeError(
72                     "GPU is required to run AWQ quantized model. You can use IPEX version AWQ if you have an Intel CPU"
73                 )

After installing pytorch library with ROCM support, at least it was able to start downloading and loading shards and other files. Now it is failing at following error (while generating IR it seems):

  File "venv.env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 100, in make_ir
    return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns,
triton.compiler.errors.CompilationError: at 21:16:
    group_size,
    BLOCK_SIZE_M: tl.constexpr,
    BLOCK_SIZE_N: tl.constexpr,
    BLOCK_SIZE_K: tl.constexpr,
    SPLIT_K: tl.constexpr,
):
    pid = tl.program_id(axis=0)
    pid_z = tl.program_id(1)

    # NOTE: This doesn't work in TRITON_INTERPRET=1 mode.  Use below instead.
    # num_pid_n = (N + BLOCK_SIZE_N - 1) // BLOCK_SIZE_N
    num_pid_n = tl.cdiv(N, BLOCK_SIZE_N)
                ^

triton.compiler.errors.CompilationError: at 10:11:
def cdiv(x, div):
    """
    Computes the ceiling division of :code:`x` by :code:`div`

    :param x: the input number
    :type x: Block
    :param div: the divisor
    :type div: Block
    """
    return (x + div - 1) // div
           ^
IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')

@amd-vivekag
Copy link
Author

@zjgarvey can you please suggest what should be the next step for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant