Skip to content

AutoLigerKernelForCausalLM.from_config support #943

@sfc-gh-sbekman

Description

@sfc-gh-sbekman

🚀 The feature, motivation and pitch

When creating a new model from scratch ideally we would want to use AutoLigerKernelForCausalLM.from_config and not AutoLigerKernelForCausalLM.from_pretrained but it looks like from_config doesn't take care of liger-kernel custom kwargs, e.g. I'd expect this to work:

            swiglu=False
            if self.using_random_model:
                # skip the weight loading for a faster startup if we are in a random model configuration mode
                return AutoLigerKernelForCausalLM.from_config(
                    model_config,
                    dtype=self.config.dtype.value,
                    swiglu=swiglu,
                )
            else:
                return AutoLigerKernelForCausalLM.from_pretrained(
                    name_or_path,
                    config=model_config,
                    dtype=self.config.dtype.value,
                    swiglu=swiglu,
                )

but it appears it just sub-classes the original class and we end up with:

[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 456, in from_config
[rank2]:     return model_class._from_config(config, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
[rank2]:     return func(*args, **kwargs)
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2311, in _from_config
[rank2]:     model = cls(config, **kwargs)
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^
[rank2]: TypeError: Qwen3MoeForCausalLM.__init__() got an unexpected keyword argument 'swiglu'

Actually, I'm not even sure if it registers liger-kernel functionality at all when creating the model object via this route. Does it? If it's not it should assert.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions