Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change layers_pattern logic #2158

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Member

@BenjaminBossan BenjaminBossan commented Oct 17, 2024

Addresses part of #2155.

Description

So far, the layers_pattern argument would only work if there was a prefix to the pattern. As an example, if the module name is:

decoder.layer.0.attn.to_q

and we pass layers_pattern="layer", this would match. However, if the module name was:

layer.0.attn.to_q

i.e. without prefix before "layer", it would not work.

Usually, when we create a model with AutoModelForFoo.from_pretrained, the "layer" part would never be first. However, if we load a model directly, e.g. through LlamaModel.from_pretrained, there is actually no prefix. As a consequence, we get no match there.

With this PR, the prefix is made optional, so that the second pattern also matches.

Status

I'm not sure yet if this should be merged, as it is technically backwards incompatible. Users can still target the desired modules by carefully crafting a regex for target_modules so that it only matches the desired layer indices. However, this is tedious and layers_pattern was introduced to avoid having to do this.

Addreses part of huggingface#2155.

Description

So far, the layers_pattern argument would only work if there was a
prefix to the pattern. As an example, if the module name is:

decoder.layer.0.attn.to_q

and we pass layers_pattern="layer", this would match. However, if the
module name was:

layer.0.attn.to_q

it would not work.

Usually, when we create a model with AutoModelForFoo.from_pretrained,
the "layer" part would never be first. However, if we load a model
directly, e.g. through LlamaModel.from_pretrained, there is actually no
prefix. As a consequence, we get no match there.

With this PR, the prefix is made optional, so that the second pattern
also matches.

Status

I'm not sure yet if this should be merged, as it is technically
backwards incompatible. Users can still target the desired modules by
carefully crafting a regex for target_modules so that it only matches
the desired layer indices. However, this is tedious and layers_pattern
was introduced to avoid having to do this.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants