Hello,
I found the section of asking whether the answers was generated from the vision or the LLM side in the paper quite interesting, and was wondering if the code to reproduce the results (where you found around a 5% degradation in performance) was released?
Thanks!