Skip to content

[question] how does one use the SFTTrainer with VLMs for prompt completion task? #4199

@osaidr

Description

@osaidr

I have a visual question answering task that I want to train a VLM for using SFT. I want to train the VLM only on the completions and not on the prompt itself.

  1. How do I use the SFTTrainer for that? For text only tasks, I can use the prompt-completion dataset type and offload everything to the SFTTrainer. Is that possible for multimodal datasets and VLMs? I went through the source code and I believe it should work fine but wondering if that's not the case.
  2. Is it possible to avoid writing the collator? I believe this is a huge can of worms that I want to avoid as much as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🏋 SFTRelated to SFT

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions