Skip to content

Convert Gemma 2 to HuggingFace #1324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
peregilk opened this issue Feb 28, 2025 · 14 comments
Open

Convert Gemma 2 to HuggingFace #1324

peregilk opened this issue Feb 28, 2025 · 14 comments
Assignees

Comments

@peregilk
Copy link

Are there any scripts for converting Gemma-2 models to HuggingFace? I see there are Llama and Mistral scripts.

@hxssgaa
Copy link

hxssgaa commented Mar 1, 2025

You can check my commit here for the conversion script here

@peregilk
Copy link
Author

peregilk commented Mar 1, 2025

That is fantastic. Thanks.

@peregilk
Copy link
Author

peregilk commented Mar 6, 2025

@hxssgaa I made a quick test of the script, trying to convert a 2B Gemma2 model. However, I am seeing this error:
ValueError: Requested shape: (2048,) is not compatible with the stored shape: (2304,). Truncating/padding is disabled by setting of strict=True. When using standard Orbax APIs, this behavior can be modified by specifying strict=FalseinArrayRestoreArgs for any array in which padding/truncation is desired.

@peregilk
Copy link
Author

peregilk commented Mar 7, 2025

@hxssgaa I understand this is because it uses the settings from the base.yml-file. However, it was not obvious to my how to get the script to either rely on the structure from the loaded model, og on the model-yml files.

I also see the script refers to convert_maxtext_to_hf.py. Is that a helper file?

@hxssgaa
Copy link

hxssgaa commented Mar 7, 2025

@hxssgaa I made a quick test of the script, trying to convert a 2B Gemma2 model. However, I am seeing this error: ValueError: Requested shape: (2048,) is not compatible with the stored shape: (2304,). Truncating/padding is disabled by setting of strict=True. When using standard Orbax APIs, this behavior can be modified by specifying strict=FalseinArrayRestoreArgs for any array in which padding/truncation is desired.

Hi @peregilk , I just did another test for conversion script of gemma2-2b, and didn't find the issue you are getting. The converted checkpoint exactly matches with official huggingface gemma2-2b-it. Please use the correct yml setting for conversion, your script should look like:

JAX_PLATFORMS=cpu python MaxText/gemma2_orbax_to_hf.py MaxText/configs/base.yml \
        base_output_directory=/tmp/output \
        load_parameters_path=/path/to/maxtext/checkpoint \
        model_name='gemma2-2b' \
        hf_model_path=/path/to/save/hf_model.bin \
        model_size=2b

@hxssgaa I understand this is because it uses the settings from the base.yml-file. However, it was not obvious to my how to get the script to either rely on the structure from the loaded model, og on the model-yml files.

I also see the script refers to convert_maxtext_to_hf.py. Is that a helper file?

It's a typo, I already fixed it in the latest commit, it should be gemma2_orbax_to_hf.py instead.

@peregilk
Copy link
Author

peregilk commented Mar 7, 2025

@hxssgaa Thanks for answering me, and sorry for posing stupid questions here. Do you first save/convert the checkpoint locally to disk first?

Or can /path/to/maxtext/checkpoint be the bucket where the trained checkpoints are stored, ie 'gs://mybucket/gemma2-2B-instruct-myfinetunedmodel1/checkpoints/0/items'.

I still dont think the example command is exactly correct, but if this is stored locally and does not require a specific yml-file, this is probably just a typo.

@hxssgaa
Copy link

hxssgaa commented Mar 7, 2025

@peregilk, no need to save the ckpt locally, you can just point the maxtext_checkpoint to the google bucket checkpoint location. Sorry tor the confusion here, I think I have changed the ckpt conversion format to be similar as llama_or_mistral_orbax_to_huggingface.py, the correct conversion script should be:

JAX_PLATFORMS=cpu python MaxText/gemma2_orbax_to_hf.py MaxText/configs/base.yml
base_output_directory=/tmp/output
load_parameters_path=/path/to/maxtext/checkpoint
model_name='gemma2-27b'
hf_model_path=/path/to/save/hf_model.bin
model_size=27b

@peregilk
Copy link
Author

peregilk commented Mar 7, 2025

Awesome @hxssgaa. I actually tried something similar but I think there was a small typo in my script earlier forcing it to not pick up the correct yaml.

However, now it works. I can also confirm that I have tried one model "all the way". I can confirm that I get exactly the same MMLU scores on the original google/gemma2-2b-it that I get when I run the test on a model that is converted from Kaggle/Flax, stored as checkpoint in MaxText and converted to HF with the gemma2_orbax_to_hf.py-script.

@R4ZZ3
Copy link

R4ZZ3 commented Apr 12, 2025

Has anyone yet tried converting gemma 3 (4b) to huggingface?

I have now done gemma 3 model from Kaggle --> maxtext format (orbax) --> continued pretraining --> (Would now like to convert to hf but seems there is now script available and I am trying to do it myself but no luck yet)

@salrowili
Copy link

@hxssgaa any chance to develop similar code to convert Gemma 3 checkpoint to HF ?

@R4ZZ3
Copy link

R4ZZ3 commented Apr 25, 2025

I created such conversion script based on this https://github.com/AI-Hypercomputer/maxtext/blob/f6ebc1662cb944bd7748fb350bba164b13479b68/MaxText/gemma2_orbax_to_hf.py and bunch of trial and error with gemini 2.5 pro in Cursor.

I was able to then run some benchmarks with the converted model + tested that the model would start GRPO finetuning with Unsloth. I can share the script once maybe today evening when I am finished with work

@salrowili
Copy link

Great @R4ZZ3
I will also test the code once you share it and get back to you with my findings.

@R4ZZ3
Copy link

R4ZZ3 commented Apr 25, 2025

Hi @salrowili

The file can now be found here:
https://github.com/R4ZZ3/gemma_3_orbax_to_hf/blob/main/convert_gemma_3_orbax_to_hf.py

@shralex
Copy link
Collaborator

shralex commented May 1, 2025

@gagika can you please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants