Bug when multi-gpus training

I met a problem when training with multi-gpus:

[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/xllmx/solvers/finetune/finetune.py", line 271, in build_model
[rank1]:     unwrapped_model, tokenizer = self._model_func(init_from)
[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/lumina_mgpt/finetune_solver.py", line 95, in _model_func
[rank1]:     model = ChameleonXLLMXForConditionalGeneration(config)
[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/lumina_mgpt/model/modeling_xllmx_chameleon.py", line 24, in __init__
[rank1]:     super().__init__(config)
[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/lumina_mgpt/model/chameleon/modeling_chameleon.py", line 1553, in __init__
[rank1]:     self.model = ChameleonModel(config)
[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/lumina_mgpt/model/chameleon/modeling_chameleon.py", line 1291, in __init__
[rank1]:     self.vocabulary_mapping = ChameleonImageVocabularyMapping(config.vocabulary_map)
[rank1]:   File "/data1/lhy/inpaint/AR_inpaint/Lumina-mGPT/lumina_mgpt/model/chameleon/modeling_chameleon.py", line 1109, in __init__
[rank1]:     self.image_token_id = vocab_map.get("<image>")
[rank1]: AttributeError: 'NoneType' object has no attribute 'get'

It seems that from the:

        if self.global_rank == 0:
            model = ChameleonXLLMXForConditionalGeneration.from_pretrained(
                init_from,
                ignore_mismatched_sizes=False,
                max_position_embeddings=self.args.max_seq_len,
                mask_image_logits=self.args.mask_image_logits,
                dropout=self.args.dropout,
                z_loss_weight=self.args.z_loss_weight,
                torch_dtype=torch.bfloat16,
                # torch_dtype=torch.float32,
                device_map="cpu",
            )
        else:
            with init_empty_weights():
                config = ChameleonXLLMXConfig.from_pretrained(
                init_from,
                max_position_embeddings=self.args.max_seq_len,
                mask_image_logits=self.args.mask_image_logits,
                dropout=self.args.dropout,
                z_loss_weight=self.args.z_loss_weight,
                torch_dtype=torch.bfloat16,
                # torch_dtype=torch.float32,
            )
                # print(config)
                # assert None
                model = ChameleonXLLMXForConditionalGeneration(config)

The "else" has config without the vocabulary_map, and the initialization fails, I haven't found other codes to sync the config. So I wonder why this bug happens and how should I solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug when multi-gpus training #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug when multi-gpus training #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions