Hello,
Any supplement material about the Factual-centered recaptioning model in the paper available?
Did you utilize strong vlm models or APIs to generate the target captions for training, guess this might be very important for future improvement of vlm/generation model.