You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am integrating your Marlin kernel into AutoRound. In my initial tests, the Marlin kernel functions well in most scenarios. However, when loading the model with a dictionary-based device map, if the g_idx is not set, an exception is thrown.
I am unsure how to easily reproduce this issue, but you can refer to my test in this AutoRound PR for more details.
Traceback (most recent call last):
File "/home/wenhuach/auto-round/test_cuda/test_auto_round_format.py", line 93, in test_device_map
outputs = model.generate(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 2326, in generate
result = self._sample(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 3286, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 853, in forward
outputs = self.model(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 601, in forward
layer_outputs = decoder_layer(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 343, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 277, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 171, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 361, in pre_forward
set_module_tensor_to_device(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([4096]) in "g_idx" (which has shape torch.Size([0])), this looks incorrect.
The text was updated successfully, but these errors were encountered:
@wenhuach21 So the error only happens when dictionarydevice_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.
The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.
@wenhuach21 So the error only happens when dictionarydevice_map is used? Was this on single or mult-gpu device_map? I know the loading code is handled by accelerate and the stacktrace shows that.
The stacktrace is showing the marlin kernel has inited the wrong shape for g_idx? Normally the kernel layer does init with in/out_features with bits/groupsize and based on those 4 vars, the kernel init should creat the correct g_idx buffers. It should not be 0. Almost as if, the kernel init was broken or was passed the wrong args pre-loading.
After some quick debugging, I believe the issue is related to Hugging Face Accelerate and your implementation. In the Transformers library, models are initially created with empty_init weights. If the device_map includes mixed devices such as CPU or disk, some parameters will be offloaded to the CPU. During this process, offloading likely relies on the initial shape of g_idx. However, in your post_init method, the shape of g_idx is modified when dsc_order is set to False, which the offloading mechanism is unaware of.
Uh oh!
There was an error while loading. Please reload this page.
I am integrating your Marlin kernel into AutoRound. In my initial tests, the Marlin kernel functions well in most scenarios. However, when loading the model with a dictionary-based device map, if the g_idx is not set, an exception is thrown.
I am unsure how to easily reproduce this issue, but you can refer to my test in this AutoRound PR for more details.
https://github.com/intel/auto-round/blob/32d15c7217c24017f7e180eef3f5d54a39df799d/test_cuda/test_auto_round_format.py#L62
Traceback (most recent call last):
File "/home/wenhuach/auto-round/test_cuda/test_auto_round_format.py", line 93, in test_device_map
outputs = model.generate(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 2326, in generate
result = self._sample(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/generation/utils.py", line 3286, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 853, in forward
outputs = self.model(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 601, in forward
layer_outputs = decoder_layer(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 343, in forward
hidden_states, self_attn_weights = self.self_attn(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 277, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 171, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/hooks.py", line 361, in pre_forward
set_module_tensor_to_device(
File "/home/wenhuach/anaconda3/envs/autoround/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([4096]) in "g_idx" (which has shape torch.Size([0])), this looks incorrect.
The text was updated successfully, but these errors were encountered: