-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Description
I am training Mistral where I am not sharding lm_head, so when the model is wrapped in fsdp i want to change the adapter like this
self.model.icae.set_adapter("encadapt")
self.model.icae.enable_adapter_layers()
for name, param in self.model.icae.named_parameters():
if "encadapt" in name:
param.requires_grad = False
compress_outputs = self.model.icae(inputs_embeds=autoencoder_input_embedding, output_hidden_states=True,
graph=graph, mem_mask=mem_mask, partial_grad=partial_grad, map_node=True)
self.model.icae.disable_adapter_layers()
Doing so gives me the error
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
So the question is how do i use enable/disable_adapter_layers in FSDP training if someone has to switch the adapter during training due to some reason.
Isalia20
Metadata
Metadata
Assignees
Labels
No labels