Hi, I try to evaluate XComposer2-4KHD on RefCOCO for REC task refer to #261. The result is quite poor. Does the coordinate in response need to be post-processed like other MLLMs (eg. for qwen2.5vl, the coordinates should be resized from the input resolution to actual resolution of image)?
Moreover, I’m wondering whether XComposer2-4KHD supports detection tasks. If so, could you please provide guidance on how such evaluation should be performed?


