Is it actually possible to train a VLM with multiple image inputs using GRPOTrainer?

According to the documentation, it is possible by having an 'images' key in the datapoints, containing a list of images. However, when I attempt this, on accelerate I get the error 

`[rank2]: Parameter indices which did not receive grad for rank 2: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 ...`

If I try to run it without accelerate, training does start, but there are no images passed to the input whatsoever. Has anybody managed to successfully run this use case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it actually possible to train a VLM with multiple image inputs using GRPOTrainer? #4169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it actually possible to train a VLM with multiple image inputs using GRPOTrainer? #4169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions