Skip to content

[BUG] RuntimeError: Numpy is not available #1403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidray222 opened this issue Mar 9, 2025 · 5 comments
Open

[BUG] RuntimeError: Numpy is not available #1403

davidray222 opened this issue Mar 9, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@davidray222
Copy link

davidray222 commented Mar 9, 2025

Describe the bug

INFO  Packing model...
INFO  Packing Kernel: Auto-selection: adding candidate `TorchQuantLinear`
INFO  Kernel: candidates -> `[TorchQuantLinear]`
INFO  Kernel: selected -> `TorchQuantLinear`.
Packing model.layers.0.mlp.gate_proj    [5 of 224] █---------------------------------------------------------------| 0:00:00 / 0:00:00 [5/224] 2.2%Traceback (most recent call last):
  File "/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/quant_deepseek_autoround.py", line 79, in <module>
    main()
  File "/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/quant_deepseek_autoround.py", line 43, in main
    model.quantize(examples)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/models/base.py", line 421, in quantize
    return module_looper.loop(
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/looper/module_looper.py", line 441, in loop
    reverse_p.finalize(model=self.gptq_model, **kwargs)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/looper/gptq_processor.py", line 200, in finalize
    model.qlinear_kernel = pack_model(
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 592, in pack_model
    for _ in executor.map(wrapper, names):
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 590, in wrapper
    pack_module(name, qModules, quant_result, modules)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 529, in pack_module
    qModules[name].pack(linear=layers[name], scales=scale, zeros=zero, g_idx=g_idx)
  File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/nn_modules/qlinear/__init__.py", line 469, in pack
    int_weight = int_weight.numpy().astype(self.pack_np_math_dtype)
RuntimeError: Numpy is not available

I used the code from "/GPTQModel/examples/quantization/basic_usage_autoround.py" to quantize deepseek-ai/DeepSeek-R1-Distill-Llama-8B and Qwen/QwQ-32B, but I encountered the same issue in both cases.

GPU Info

Show output of:NVIDIA A6000

nvidia-smi

Software Info

CUDA Version: 12.8

Show output of:

Name: gptqmodel
Version: 2.0.1.dev0
---
Name: torch
Version: 2.2.0
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 2.2.0
Name: numpy
Version:2.2.3
(gptqmodel) david@asus-ESC4000-E11:/mnt/8tb_raid/david_model/GPTQModel$ pip list
Package                  Version
------------------------ -----------
accelerate               1.3.0
aiohappyeyeballs         2.5.0
aiohttp                  3.11.13
aiosignal                1.3.2
async-timeout            5.0.1
attrs                    25.1.0
certifi                  2025.1.31
charset-normalizer       3.4.1
datasets                 3.3.2
device-smi               0.4.1
dill                     0.3.8
filelock                 3.17.0
frozenlist               1.5.0
fsspec                   2024.12.0
gptqmodel                2.0.1.dev0
hf_transfer              0.1.9
huggingface-hub          0.29.2
idna                     3.10
Jinja2                   3.1.6
logbar                   0.0.3
MarkupSafe               3.0.2
mpmath                   1.3.0
multidict                6.1.0
multiprocess             0.70.16
networkx                 3.4.2
numpy                    2.2.3
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-cusparselt-cu12   0.6.2
nvidia-nccl-cu12         2.19.3
nvidia-nvjitlink-cu12    12.8.93
nvidia-nvtx-cu12         12.1.105
packaging                24.2
pandas                   2.2.3
pillow                   11.1.0
pip                      25.0
propcache                0.3.0
protobuf                 6.30.0
psutil                   7.0.0
pyarrow                  19.0.1
python-dateutil          2.9.0.post0
pytz                     2025.1
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.3
safetensors              0.5.3
setuptools               75.8.0
six                      1.17.0
sympy                    1.13.1
threadpoolctl            3.5.0
tokenicer                0.0.4
tokenizers               0.21.0
torch                    2.2.0
tqdm                     4.67.1
transformers             4.49.0
triton                   2.2.0
typing_extensions        4.12.2
tzdata                   2025.1
urllib3                  2.3.0
wheel                    0.45.1
xxhash                   3.5.0
yarl                     1.18.3

my code:

# Copyright 2024-2025 ModelCloud.ai
# Copyright 2024-2025 [email protected]
# Contact: [email protected], x.com/qubitium
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
from gptqmodel import GPTQModel
from gptqmodel.quantization.config import AutoRoundQuantizeConfig  # noqa: E402
from transformers import AutoTokenizer

pretrained_model_id = "Qwen/QwQ-32B" # "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
quantized_model_id = "./autoround/Qwen-QwQ-32B-4bit-32g"

def main():
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_id, use_fast=True)
    examples = [
        tokenizer(
            "gptqmodel is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
        )
    ]

    quantize_config = AutoRoundQuantizeConfig(
        bits=4,
        group_size=32
    )

    model = GPTQModel.load(
        pretrained_model_id,
        quantize_config=quantize_config,
    )

    model.quantize(examples)

    model.save(quantized_model_id)

    tokenizer.save_pretrained(quantized_model_id)

    del model

    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    model = GPTQModel.from_quantized(
        quantized_model_id,
        device=device,
    )

    input_ids = torch.ones((1, 1), dtype=torch.long, device=device)
    outputs = model(input_ids=input_ids)
    print(f"output logits {outputs.logits.shape}: \n", outputs.logits)
    # inference with model.generate
    print(
        tokenizer.decode(
            model.generate(
                **tokenizer("gptqmodel is", return_tensors="pt").to(model.device)
            )[0]
        )
    )


if __name__ == "__main__":
    import logging

    logging.basicConfig(
        format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
        level=logging.INFO,
        datefmt="%Y-%m-%d %H:%M:%S",
    )

    main()

Thank you!!!!!!!!!!!!!!!!

@davidray222 davidray222 added the bug Something isn't working label Mar 9, 2025
@Qubitium
Copy link
Collaborator

Qubitium commented Mar 9, 2025

@davidray222 Thanks for the report. I think your torch version 2.2 maybe too old. Or maybe I broke something in main. Will check soon!

@Qubitium
Copy link
Collaborator

Qubitium commented Mar 9, 2025

@davidray222 I think you have a broken Numy pkg install. Try the following:

import torch

t = torch.tensor([1, 2, 3], dtype=torch.int32)
print(t.numpy())

Run the above python code in your env. If you get same error, I suggest you uninstall numpy and re-install numpy.

@Qubitium
Copy link
Collaborator

Qubitium commented Mar 9, 2025

@davidray222 Also please use releae-version 2.0 if possible. There may be other bugs in the main/devel branch. There are some changes on main that have not been ci validated.

@davidray222
Copy link
Author

@Qubitium
I successfully quantized DeepSeek-R1-Distill-Llama-8B into a 4-bit model, but I encountered an issue where, no matter what input I provide, the output is always "!!!!!!!!". Could this be due to a mistake in my process, or is it related to an issue with my environment or version compatibility?

Image

I update Software version:

CUDA Version: 12.8

Show output of:

Name: gptqmodel
Version: 2.0.0
---
Name: torch
Version: 2.4.0
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 3.0.0
---
Name: numpy
Version:2.2.2

my steps:
1.
I used the code from "/GPTQModel/examples/quantization/basic_usage_autoround.py" to quantize deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
2.
I got these files

Image

I use this code:
export GPTQMODEL_USE_MODELSCOPE=True

from gptqmodel import GPTQModel
# load Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 from modelscope
model = GPTQModel.load("/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/autoround/DeepSeek-R1-Distill-Llama-8B-4bit-32g/")
result = model.generate("hello")[0] # tokens
print(model.tokenizer.decode(result)) # string output

get the output always is

Image


I would like to ask if I made a mistake somewhere.Thank you!!

@EverlynAsiko
Copy link

I am getting a similar error:


ImportError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in _get_module(self, module_name)
1862 try:
-> 1863 return importlib.import_module("." + module_name, self.name)
1864 except Exception as e:

47 frames
ImportError: cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in _get_module(self, module_name)
1863 return importlib.import_module("." + module_name, self.name)
1864 except Exception as e:
-> 1865 raise RuntimeError(
1866 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its"
1867 f" traceback):\n{e}"

RuntimeError: Failed to import transformers.models.auto.tokenization_auto because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)

from running

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

My library versions:

Name: gptqmodel
Version: 2.0.0
---
Name: torch
Version: 2.6.0+cu124
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 3.2.0
---
Name: numpy
Version:2.2.4

When I downgrade numpy to 2.2.2, I get:


ImportError Traceback (most recent call last)
in <cell line: 0>()
1 from datasets import load_dataset
----> 2 from gptqmodel import GPTQModel, QuantizeConfig

3 frames
/usr/local/lib/python3.11/dist-packages/gptqmodel/utils/model.py in
47 from ..adapter.adapter import Adapter
48 from ..looper.named_module import NamedModule
---> 49 from ..models._const import (
50 CPU,
51 DEVICE,

ImportError: cannot import name 'SUPPORTED_MODELS' from 'gptqmodel.models._const' (/usr/local/lib/python3.11/dist-packages/gptqmodel/models/_const.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants