-
Notifications
You must be signed in to change notification settings - Fork 438
Open
Description
🚀 The feature, motivation and pitch
Currently, LigerMLP modules only fuse swiglu/geglu computations together and leave matmuls untouched. These elementwise operations (including multiplier support #936 ) could be easily fused into matmul's epilogues. We can investigate the performance of this approach and see if we should adopt it.
TL;DR
Instead of
gate_states = self.gate_proj(x)
up_states = self.up_proj(x)
intermidiate_states = LigerSiLUMulFunction.apply(gate_states , up_states)
return self.down_proj(intermidiate_states)There are some other approaches worth exploring:
- fuse activations (and multiplier) into gate_proj(x)
up_states = self.up_proj(x)
intermidiate_states = LigerFusedLinearActMultiplierFunction.apply(x, self.gate_proj.weight, gate_multiplier, up_states)
return self.down_proj(intermidiate_states)- stack gate and up projections then put it into activation functions
gate_up_states = self.gate_up_proj(x)
intermidiate_states = LigerSplitStatesActMultiplierFunction.apply(
gate_up_states,
config.hidden_act,
gate_multiplier,
up_states
)
return self.down_proj(intermidiate_states)- dual gemm with activations (and multiplier)
intermidiate_states = LigerDualGemmActMulFuncion.apply(
x,
self.gate_proj.weight,
self.up_proj.weight,
config.hidden_act,
gate_multiplier,
)
return self.down_proj(intermidiate_states)Alternatives
No response
Additional context
No response
0xtoward
Metadata
Metadata
Assignees
Labels
No labels