marlin g_idx issue when using dict devcie_map #1499

wenhuach21 · 2025-04-03T02:33:36Z

I am integrating your Marlin kernel into AutoRound. In my initial tests, the Marlin kernel functions well in most scenarios. However, when loading the model with a dictionary-based device map, if the g_idx is not set, an exception is thrown.

I am unsure how to easily reproduce this issue, but you can refer to my test in this AutoRound PR for more details.

https://github.com/intel/auto-round/blob/32d15c7217c24017f7e180eef3f5d54a39df799d/test_cuda/test_auto_round_format.py#L62

Traceback (most recent call last):
File "/home/wenhuach/auto-round/test_cuda/test_auto_round_format.py", line 93, in test_device_map
outputs = model.generate(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 2326, in generate
result = self._sample(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 3286, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 853, in forward
outputs = self.model(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 601, in forward
layer_outputs = decoder_layer(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 343, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 277, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 171, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 361, in pre_forward
set_module_tensor_to_device(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([4096]) in "g_idx" (which has shape torch.Size([0])), this looks incorrect.

Qubitium · 2025-04-03T05:34:45Z

@wenhuach21 So the error only happens when dictionary device_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.

The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.

wenhuach21 · 2025-04-08T06:26:54Z

@wenhuach21 So the error only happens when dictionary device_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.

The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.

After some quick debugging, I believe the issue is related to Hugging Face Accelerate and your implementation. In the Transformers library, models are initially created with empty_init weights. If the device_map includes mixed devices such as CPU or disk, some parameters will be offloaded to the CPU. During this process, offloading likely relies on the initial shape of g_idx. However, in your post_init method, the shape of g_idx is modified when dsc_order is set to False, which the offloading mechanism is unaware of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

marlin g_idx issue when using dict devcie_map #1499

marlin g_idx issue when using dict devcie_map #1499

wenhuach21 commented Apr 3, 2025 •

edited

Loading

Qubitium commented Apr 3, 2025

Uh oh!

wenhuach21 commented Apr 8, 2025

Uh oh!

marlin g_idx issue when using dict devcie_map #1499

marlin g_idx issue when using dict devcie_map #1499

Comments

wenhuach21 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Qubitium commented Apr 3, 2025

Uh oh!

wenhuach21 commented Apr 8, 2025

Uh oh!

wenhuach21 commented Apr 3, 2025 •

edited

Loading