Feature request
Enable device-aware handling of segmentation label tensors (specifically class_labels and mask_labels) returned by EOMTImageProcessor.
Currently, the processor outputs lists of tensors, which cannot be moved to a device using .to(device), leading to device mismatches during training.
Motivation
I’ve been fine-tuning EOMT on the segments/sidewalk-semantic dataset using the Transformers image processor.
Here is the code reproducing the issue.
Your contribution
Happy to help brainstorm and work on this if need be.
CC: @merveenoyan @NielsRogge @molbap