You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In e.g. the LongLoRA paper they fully train both the embedding and the norm layers while still applying LoRA to self-attention layers. Our recipes set only LoRA parameters to trainable here, but it shouldn't be too hard to support passing additional layers to that function from the config. E.g. it could be similar to our usage of custom_sharded_layers.
The text was updated successfully, but these errors were encountered:
when someone picks it up, have in mind that all the configs and and model builders would have to be updated. So, before doing it, please put up a draft PR, and only after you get an ok showing that it works for one model, update all the builders/recipe. This will save you a lot of work :)
Edit: we would only have to touch the builders if we implemented LoraEmbedding. My bad.
Quick edit to @felipemello1's comment: we can do this without touching the model builders (similar to the custom_sharded_layers example). Also similar to that comment, we don't have to expose in every config file -- we can instead use our usual pattern of e.g. additional_trainable_layers=cfg.get("additional_trainable_layers", None)
If any brave soul wants to go the extra mile, I would prefer exposing in all configs and not let hidden defaults proliferate. But I understand that it is much, much easier
In e.g. the LongLoRA paper they fully train both the embedding and the norm layers while still applying LoRA to self-attention layers. Our recipes set only LoRA parameters to trainable here, but it shouldn't be too hard to support passing additional layers to that function from the config. E.g. it could be similar to our usage of custom_sharded_layers.
The text was updated successfully, but these errors were encountered: