[question] how does one use the SFTTrainer with VLMs for prompt completion task?

I have a visual question answering task that I want to train a VLM for using SFT. I want to train the VLM only on the completions and not on the prompt itself. 
1. How do I use the SFTTrainer for that? For text only tasks, I can use the prompt-completion dataset type and offload everything to the SFTTrainer. Is that possible for multimodal datasets and VLMs? I went through the source code and I believe it should work fine but wondering if that's not the case.
2. Is it possible to avoid writing the collator? I believe this is a huge can of worms that I want to avoid as much as possible.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] how does one use the SFTTrainer with VLMs for prompt completion task? #4199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[question] how does one use the SFTTrainer with VLMs for prompt completion task? #4199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions