Deepspeed + embeddings fix #254

anas-awadalla · 2023-08-31T21:37:37Z

Added support for deepspeed and made training embeddings cleaner using Idefics method.

anas-awadalla · 2023-08-31T21:55:47Z

open_flamingo/src/factory.py

@@ -87,7 +87,6 @@ def set_input_embeddings(self, new_embeddings):
    if decoder_layers_attr_name is None:
        decoder_layers_attr_name = _infer_decoder_layers_attr_name(lang_encoder)
    lang_encoder.set_decoder_layers_attr_name(decoder_layers_attr_name)
-    lang_encoder.resize_token_embeddings(len(text_tokenizer))

    model = Flamingo(


With the new embeddings method we need to convert the old checkpoints into this format? Or maybe have the new embeds be behind a flag and say something about how this is the desired option for new models?

Yeah, good question. My vote would be to preserve backward compatibility for now & place the new embeds behind a flag (default set to True)

i-gao · 2023-08-31T23:00:49Z

open_flamingo/src/factory.py


-    # Unfreeze perceiver, gated_cross_attn_layers, and LM input embeddings
-    model.perceiver.requires_grad_(True)


For readability: keep this line if the following line is being kept?

i-gao · 2023-08-31T23:03:28Z

open_flamingo/src/flamingo_lm.py

+        # create a get_output_embeddings() / set_output_embeddings() method if it doesn't exist
+        # this is needed for compatibility
+        if not hasattr(self, "get_output_embeddings"):
+            self.get_output_embeddings = lambda: self.lm_head
+            self.set_output_embeddings = lambda x: setattr(self, "lm_head", x)


Could we move this to factory.py to match how we handle get_input_embeddings for MPT-1B (factory.py:line 73)? To avoid hard-coding for MPT

i-gao · 2023-08-31T23:11:59Z

open_flamingo/src/helpers.py

@@ -277,3 +278,187 @@ def forward(
        x = self.ff(x) * self.ff_gate.tanh() + x

        return x
+


This is great! Before we merge, I'll need to test with FSDP; I think this hopefully resolves some of the issues that required freezing the LM embeddings.

i-gao · 2023-08-31T23:14:12Z

open_flamingo/train/train.py

@@ -218,6 +233,8 @@ def main():

    args = parser.parse_args()

+    args.local_rank = int(os.environ.get("LOCAL_RANK", -1))  # for deepspeed


Hmm, just to understand what's happening here -- deepspeed populates this environment variable, which we load, but then our code overwrites this variable based on Slurm logic on line 272. Are we sure these match?

i-gao · 2023-08-31T23:21:01Z

open_flamingo/train/train.py

-    # Initialize gradient checkpointing
-    if args.gradient_checkpointing:
-        non_reentrant_wrapper = functools.partial(
-            checkpoint_wrapper,
-            offload_to_cpu=True,
-            checkpoint_impl=CheckpointImpl.NO_REENTRANT,
-        )
-        apply_activation_checkpointing(
-            ddp_model,
-            checkpoint_wrapper_fn=non_reentrant_wrapper,
-            check_fn=lambda m: getattr(m, "_use_gradient_checkpointing", False)
-            and not isinstance(m, FSDP)
-            and not isinstance(m, CheckpointWrapper),
-        )
-


Haven't run this to double check, but I think for non-deepspeed, the checkpointing logic needs to come before initializing the optimizer, or the optimizer may be referring to parameters w/o checkpoint wrapper classes, while the model refers to parameters w/ the wrapper. Could we check this?

i-gao · 2023-08-31T23:28:00Z

open_flamingo/train/train.py

+            "stage3_param_persistence_threshold": 1e4,
+            "stage3_max_live_parameters": 3e7,
+            "stage3_prefetch_bucket_size": 3e7,
+            "stage3_gather_16bit_weights_on_model_save": True,


Are all model weights saved in 16bit, regardless of the precision flag?

This saves additional 16 bit weights is for stage3 but is unnecessary and will be slow! There is a script to 'reconstruct' the fp32 weights from the checkpoint as described here.

I see -- so since we save ckpts fairly often, should we should set this flag to False by default, and then provide a separate script to all_gather the weights into one file offline? (I guess wandb ckpt saving will need to be turned off)

Deepspeed auto creates the script in the checkpoint dir which is nice :). Good point on wandb. For stage 3 yes although we can also just save the sharded checkpoint.

i-gao · 2023-08-31T23:32:15Z

open_flamingo/src/factory.py

@@ -87,7 +87,6 @@ def set_input_embeddings(self, new_embeddings):
    if decoder_layers_attr_name is None:
        decoder_layers_attr_name = _infer_decoder_layers_attr_name(lang_encoder)
    lang_encoder.set_decoder_layers_attr_name(decoder_layers_attr_name)
-    lang_encoder.resize_token_embeddings(len(text_tokenizer))

    model = Flamingo(


Yeah, good question. My vote would be to preserve backward compatibility for now & place the new embeds behind a flag (default set to True)

open_flamingo/src/flamingo_lm.py

anas-awadalla · 2023-09-01T00:20:09Z

open_flamingo/train/train_utils.py

+
+    if args.rank == 0:
+        if args.report_to_wandb and args.save_checkpoints_to_wandb:
+            wandb.save(f"{args.run_name}/epoch_{epoch}/mp_rank_00_model_states.pt")


TODO: handle saving stage3 shard state.

…mingo into deepspeed

Deepspeed inference

i-gao and others added 4 commits August 28, 2023 19:17

add other models, deepspeed attempt

0bfba86

fixed embedding freezing using Idefics among other things

ff0bdf5

Reverted specified files to the version in origin/main

f7b45a4

added deepspeed to reqs

176bbc0

anas-awadalla requested a review from i-gao August 31, 2023 21:44

anas-awadalla commented Aug 31, 2023

View reviewed changes

i-gao reviewed Aug 31, 2023

View reviewed changes

anas-awadalla added 2 commits August 31, 2023 16:54

remove stage3 16 bit weights and local_rank arg

71ddca6

move get_embed func to factory.py and restore req grad call

76f4f8c

anas-awadalla commented Sep 1, 2023

View reviewed changes

open_flamingo/src/flamingo_lm.py Show resolved Hide resolved

anas-awadalla commented Sep 1, 2023

View reviewed changes

Anas Awadalla and others added 19 commits September 1, 2023 04:17

fix bias check

400fffc

another bias fix

bb01717

remove trust_remote_code as mpt is part of transformers now

104975c

add lm_head check

17babfe

more changes

10afa74

Update factory.py

a248c26

init on device to avoid cpu oom

c453ca8

update eval script to use deepspeed

de187aa

restore pad_to_multiple_of kwarg in factory

6f62054

fixed embed not training issue

e626afb

Merge branch 'deepspeed' of https://github.com/mlfoundations/open_fla…

b91da53

…mingo into deepspeed

embed training

ecc74ad

fix merge conflict

3230715

tie decoupled embeddings

fe39d1a

untie embeds for fsdp

8bd7273

move grad checkpointing before optimizer creation

d10a998

default is not to untie

4f3ce24

fix embed init

2c5d864

Update factory.py

8ff8d5e

anas-awadalla added 4 commits September 8, 2023 17:35

fix embed init and out embed concat

d75fecd

Merge branch 'deepspeed' of https://github.com/mlfoundations/open_fla…

3805d0f

…mingo into deepspeed

Merge branch 'deepspeed' into deepspeed_inference

27ce458

Merge pull request #255 from mlfoundations/deepspeed_inference

c2e95b4

Deepspeed inference

anas-awadalla changed the base branch from main to mllm September 16, 2023 18:38

anas-awadalla closed this Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed + embeddings fix #254

Deepspeed + embeddings fix #254

anas-awadalla commented Aug 31, 2023

anas-awadalla Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

i-gao Aug 31, 2023

anas-awadalla Aug 31, 2023

i-gao Aug 31, 2023

anas-awadalla Sep 1, 2023

i-gao Aug 31, 2023

anas-awadalla Sep 1, 2023


		# Unfreeze perceiver, gated_cross_attn_layers, and LM input embeddings
		model.perceiver.requires_grad_(True)

		@@ -277,3 +278,187 @@ def forward(
		x = self.ff(x) * self.ff_gate.tanh() + x

		return x

		@@ -218,6 +233,8 @@ def main():

		args = parser.parse_args()

		args.local_rank = int(os.environ.get("LOCAL_RANK", -1)) # for deepspeed

Deepspeed + embeddings fix #254

Deepspeed + embeddings fix #254

Conversation

anas-awadalla commented Aug 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment