Skip to content

marlin g_idx issue when using dict devcie_map #1499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wenhuach21 opened this issue Apr 3, 2025 · 2 comments
Open

marlin g_idx issue when using dict devcie_map #1499

wenhuach21 opened this issue Apr 3, 2025 · 2 comments

Comments

@wenhuach21
Copy link

wenhuach21 commented Apr 3, 2025

I am integrating your Marlin kernel into AutoRound. In my initial tests, the Marlin kernel functions well in most scenarios. However, when loading the model with a dictionary-based device map, if the g_idx is not set, an exception is thrown.

I am unsure how to easily reproduce this issue, but you can refer to my test in this AutoRound PR for more details.

https://github.com/intel/auto-round/blob/32d15c7217c24017f7e180eef3f5d54a39df799d/test_cuda/test_auto_round_format.py#L62

Traceback (most recent call last):
File "/home/wenhuach/auto-round/test_cuda/test_auto_round_format.py", line 93, in test_device_map
outputs = model.generate(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 2326, in generate
result = self._sample(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 3286, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 853, in forward
outputs = self.model(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 601, in forward
layer_outputs = decoder_layer(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 343, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 277, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 171, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 361, in pre_forward
set_module_tensor_to_device(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([4096]) in "g_idx" (which has shape torch.Size([0])), this looks incorrect.

@Qubitium
Copy link
Collaborator

Qubitium commented Apr 3, 2025

@wenhuach21 So the error only happens when dictionary device_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.

The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.

@wenhuach21
Copy link
Author

@wenhuach21 So the error only happens when dictionary device_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.

The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.

After some quick debugging, I believe the issue is related to Hugging Face Accelerate and your implementation. In the Transformers library, models are initially created with empty_init weights. If the device_map includes mixed devices such as CPU or disk, some parameters will be offloaded to the CPU. During this process, offloading likely relies on the initial shape of g_idx. However, in your post_init method, the shape of g_idx is modified when dsc_order is set to False, which the offloading mechanism is unaware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants