Skip to content

Qusetion of training setting for text-to-image finetuning #42

@KZF-kzf

Description

@KZF-kzf

Hello, I'm not very familiar with the transformers library and it took me several days to barely understand the code. I noticed that the model's causal_mask is None while fine-tuning a text-to-image generation task — isn't that abnormal? Is it because the --unmask_image_logits argument was set in the public training script(exps/7B.sh)? If I want to fine-tune the text-to-image generation task, should I remove this argument?

Additionally, i make dataset by train.md, such as:
[ { "conversations":[ { "from": "human", "value": "Generate an image of 768x768 according to the following prompt:\n a dog." }, { "from": "gpt", "value": "<|image|>" } ], "image": ["./00.jpg"] }, { "conversations":[ { "from": "human", "value": "Generate an image of 768x768 according to the following prompt:\n a cat." }, { "from": "gpt", "value": "<|image|>" } ], "image": ["./01.jpg"] }, { "conversations":[ { "from": "human", "value": "Generate an image of 768x768 according to the following prompt:\n a horse." }, { "from": "gpt", "value": "<|image|>" } ], "image": ["./02.jpg"] } ]

Another issue I've observed is that Lumina_mgpt relies on the input prompt during inference to determine whether to generate image tokens or text tokens. However, I found that without using the specific template mentioned in the paper — such as “Generate an image of 1024x1024 according to the following prompt:...”, the model sometimes fails to generate images, e.g., when using a simpler prompt like “generate an image of dog.” This behavior seems unusual. Why should the decision between image and text generation be manually controlled by a FLAG rather than being handled automatically by the model?

I am really looking forward to your reply. Thank you!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions