How to load PyTorch checkpoints into JAX/Flax? #927

marcvanzee · 2021-01-18T11:17:50Z

marcvanzee
Jan 18, 2021

Jan 22, 2021

Pytorch checkpoints contain a state_dict with all the weights/parameters for the models, and converting it to Flax involves:

Defining the model using Flax modules
Renaming the dictionary items to line up, and use the NCHW dimensions for conv weights.

Often flax.traverse_util.flatten_dict is useful, because you only need to operate on a flat dict instead of a nested dict. Once they align you use unflatten_dict to get the normal form back.

@nikitakit wrote the following code for importing PyTorch BERT checkpoints into a Flax model: https://github.com/nikitakit/flax_bert/blob/master/import_weights.py

View full answer

marcvanzee · 2021-01-22T12:46:34Z

marcvanzee
Jan 22, 2021
Author

Pytorch checkpoints contain a state_dict with all the weights/parameters for the models, and converting it to Flax involves:

Defining the model using Flax modules
Renaming the dictionary items to line up, and use the NCHW dimensions for conv weights.

Often flax.traverse_util.flatten_dict is useful, because you only need to operate on a flat dict instead of a nested dict. Once they align you use unflatten_dict to get the normal form back.

@nikitakit wrote the following code for importing PyTorch BERT checkpoints into a Flax model: https://github.com/nikitakit/flax_bert/blob/master/import_weights.py

6 replies

avital Jan 26, 2021

Here's another examples from Hugging Face BERT: https://github.com/huggingface/transformers/blob/a880f2549fd5652030afc244f3bb27ec764c5e43/src/transformers/models/bert/modeling_flax_bert.py#L452

vaishnkv Mar 6, 2024

Hi, avital, I think this feature is not available as an API within the currently available transformers library? Is there any "official" way of doing this @avital ?

vaishnkv Mar 7, 2024

This can be done with the "from_pt" argument (if we are loading a pre-trained model). I'm attaching screenshots for the same.

davisyoshida Mar 11, 2024

@GCP20 There's no generic solution because people can choose to name their parameters different things between the two implementations. I wrote a helper script that does 90% of the work using a string similarity bipartite match, then I clean up the remainder manually.

GinRawin Apr 13, 2024

Thank you for your guys' help. This method also works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to load PyTorch checkpoints into JAX/Flax? #927

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to load PyTorch checkpoints into JAX/Flax? #927

Uh oh!

Uh oh!

marcvanzee Jan 18, 2021

Replies: 1 comment · 6 replies

Uh oh!

marcvanzee Jan 22, 2021 Author

Uh oh!

avital Jan 26, 2021

Uh oh!

vaishnkv Mar 6, 2024

Uh oh!

vaishnkv Mar 7, 2024

Uh oh!

davisyoshida Mar 11, 2024

Uh oh!

GinRawin Apr 13, 2024

marcvanzee
Jan 18, 2021

Replies: 1 comment 6 replies

marcvanzee
Jan 22, 2021
Author