FIX: Correctly determine no_split_modules #2570
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See discussion in huggingface/transformers#38141 for context.
In the PEFT
fsdp_auto_wrap
policy, we determine the_no_split_modules
. However, this currently neglects to visit the children of the model, which can be required for some architectures. This PR fixes that.Note that the
_get_no_split_modules
function is largely copied from transformers. One change is that it doesn't take thedevice_map
argument. That argument is used in transformers inside an error message but not for the logic proper. I think it's safe to remove.Moreover, I made an unrelated change to
fsdp_auto_wrap_policy
, namely making local imports global (there was no reason for them to be local).