@Swayam4414 Informed us today that they would like to use StanfordAIMI/CheXagent-8b but are unable to feed it images.
We explained that this is because if a model is marked as text-generation, it only accepts text. If a model belongs to the image-text-to-text category, it can be used to process images and text.
As a result, we'll be expanding the capabilities of text-generation models that support multi modal inputs, starting with StanfordAIMI/CheXagent-8b and then moving onto others that may meet the criteria.