Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses part of #2155.
Description
So far, the
layers_pattern
argument would only work if there was a prefix to the pattern. As an example, if the module name is:decoder.layer.0.attn.to_q
and we pass
layers_pattern="layer"
, this would match. However, if the module name was:layer.0.attn.to_q
i.e. without prefix before
"layer"
, it would not work.Usually, when we create a model with
AutoModelForFoo.from_pretrained
, the"layer"
part would never be first. However, if we load a model directly, e.g. throughLlamaModel.from_pretrained
, there is actually no prefix. As a consequence, we get no match there.With this PR, the prefix is made optional, so that the second pattern also matches.
Status
I'm not sure yet if this should be merged, as it is technically backwards incompatible. Users can still target the desired modules by carefully crafting a regex for target_modules so that it only matches the desired layer indices. However, this is tedious and
layers_pattern
was introduced to avoid having to do this.