Skip to content

BUG (w FIX): Llama3 conversion to HF does not work #1774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
peregilk opened this issue May 23, 2025 · 1 comment
Open

BUG (w FIX): Llama3 conversion to HF does not work #1774

peregilk opened this issue May 23, 2025 · 1 comment
Assignees

Comments

@peregilk
Copy link

peregilk commented May 23, 2025

@SamuelMarks @khatwanimohit

Even if Llama3 in mentioned as one of the models in llama_mistral_mixtral_orbax_to_hf.py, it does not convert these models correctly to HF.

This is primarily due to different handling of Rotary Positional Embedding (RoPE) weight permutations in Llama3.

The Llama 2/Mistral/Mixtral require a specific permutation of query (Q) and key (K) projection weights when converting from MaxText to Hugging Face format. It seems like the original script (via MaxText.max_utils.unpermute_from_match_maxtext_rope) performed this.

Llama 3 Family (3, 3.1, 3.2) do not require this same permutation; their Q/K weights from MaxText are already in the Hugging Face expected order for RoPE. Using the old conversion script runs without errors, but the converted models are really bad.

I have a working script here. It might be a bit messy, and I have not tested this on Mistral/Mixtral/Llama2. I have verified that the output looks good for the converted Llama3.1 checkpoint. Comparing the converted model with the original checkpoint shows that they are identical.

The old script was hardcoded for float16. I changed this to bfloat16.

Since the script added some extra logic, it might be better to not build on top of the old script. So I did not do a PR on this.

@RissyRan
Copy link
Collaborator

@khatwanimohit could you help take a look at this related to Llama3 family?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants