-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: tuple index out of range #47
Comments
do you solve it? i have the same error |
not yet |
I have some problem as well. |
Same problem here, the structure of grouped parameters is: List[Dict['params'], Dict['params', 'rank', 'update_proj_gap', 'scale', 'proj_type']] where 'params' is a list of tensors. I'm trying with pretrained models from huggingface. |
I found the error. For the projection in a lower rank, you need tensors of dimension 2 (matrix). If parameters of dimension 1 as LayerNorm are added to the galore_params, it would raise the error trying to access the second dimension. Make sure that ALL the parameters sent to galore_params have a second dimension. |
do u use the deepspeed and train on multi-node? i just select the attn and mlp modules for galore. |
I'm just testing Galore on Local with only Pytorch; this is the function I've been using for grouping the parameters: `
` Consider the 'model' variable a pre-trained model loaded from Huggingface. I've checked the different layer names with the VSC debugger and set the 'if' statements accordingly, so you should change them for your specific model. For instance, 'classifier' applies only to classification heads on top of Language Models. The test has been done with RoBERTa. |
From my wacky understanding, GaLore only works with |
Actually, it worked for me with the embeddings layers, which are from |
Specifically, the error comes from line 15 in
For the standard projection type it will compare the first and the second dimension of each tensor given in |
I am facing the same issue. I am training using FSDP and use_orig_params=True, FSDP however still flattens the parameters. Has anyone faced this issue before and was able to fix it? |
Hi Jiawei,
I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error:
I use galore as you suggested in torchrun_main.py:
Any idea why and how to fix it?
Thanks in advance!
The text was updated successfully, but these errors were encountered: