Open
Description
When applying MultiPruner to qwen2.5-0.5b, because qwen2.5-0.5b has num_attention_heads: 14, num_key_value_heads: 2, and head_size: 64, the attn_channel_group_size must be set to at least 448. However, this results in too few groups (only two), which might impact the results. Could this have a significant effect on the outcome? Are there alternative approaches?
Metadata
Metadata
Assignees
Labels
No labels