Change layers_pattern logic #2158

BenjaminBossan · 2024-10-17T15:07:57Z

Addresses part of #2155.

Description

So far, the layers_pattern argument would only work if there was a prefix to the pattern. As an example, if the module name is:

decoder.layer.0.attn.to_q

and we pass layers_pattern="layer", this would match. However, if the module name was:

layer.0.attn.to_q

i.e. without prefix before "layer", it would not work.

Usually, when we create a model with AutoModelForFoo.from_pretrained, the "layer" part would never be first. However, if we load a model directly, e.g. through LlamaModel.from_pretrained, there is actually no prefix. As a consequence, we get no match there.

With this PR, the prefix is made optional, so that the second pattern also matches.

Status

I'm not sure yet if this should be merged, as it is technically backwards incompatible. Users can still target the desired modules by carefully crafting a regex for target_modules so that it only matches the desired layer indices. However, this is tedious and layers_pattern was introduced to avoid having to do this.

Addreses part of huggingface#2155. Description So far, the layers_pattern argument would only work if there was a prefix to the pattern. As an example, if the module name is: decoder.layer.0.attn.to_q and we pass layers_pattern="layer", this would match. However, if the module name was: layer.0.attn.to_q it would not work. Usually, when we create a model with AutoModelForFoo.from_pretrained, the "layer" part would never be first. However, if we load a model directly, e.g. through LlamaModel.from_pretrained, there is actually no prefix. As a consequence, we get no match there. With this PR, the prefix is made optional, so that the second pattern also matches. Status I'm not sure yet if this should be merged, as it is technically backwards incompatible. Users can still target the desired modules by carefully crafting a regex for target_modules so that it only matches the desired layer indices. However, this is tedious and layers_pattern was introduced to avoid having to do this.

HuggingFaceDocBuilderDev · 2024-10-17T15:11:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan mentioned this pull request Oct 17, 2024

LoraConfig conflict when using layers_to_transform in LlamaModel #2155

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change layers_pattern logic #2158

Change layers_pattern logic #2158

BenjaminBossan commented Oct 17, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 17, 2024

Change layers_pattern logic #2158

Are you sure you want to change the base?

Change layers_pattern logic #2158

Conversation

BenjaminBossan commented Oct 17, 2024 • edited Loading

Description

Status

HuggingFaceDocBuilderDev commented Oct 17, 2024

BenjaminBossan commented Oct 17, 2024 •

edited

Loading