Skip to content

models Salesforce BLIP image captioning base

github-actions[bot] edited this page Oct 31, 2023 · 13 revisions

Salesforce-BLIP-image-captioning-base

Overview

The BLIP framework is a new Vision-Language Pre-training (VLP) framework that can be used for both vision-language understanding and generation tasks. BLIP effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. This framework achieves state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval, image captioning, and VQA. BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. The code, models, and datasets are available for use. Researchers should carefully assess the safety and fairness of the model before deploying it in any real-world applications.

The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time image-to-text-online-endpoint.ipynb image-to-text-online-endpoint.sh
Batch image-to-text-batch-endpoint.ipynb image-to-text-batch-endpoint.sh

Sample inputs and outputs (for real-time inference)

Sample input

{
   "input_data":{
      "columns":[
         "image"
      ],
      "index":[0, 1],
      "data":[
         ["image1"],
         ["image2"]
      ]
   }
}

Note:

  • "image1" and "image2" should be publicly accessible urls or strings in base64 format.

Sample output

[
   {
      "text": "a box of food sitting on top of a table"
   },
   {
      "text": "a stream in the middle of a forest"
   }
]

Model inference: Text for the sample image

Salesforce-BLIP-image-captioning-base

Version: 2

Tags

Preview license : mit task : image-to-text

View in Studio: https://ml.azure.com/registries/azureml/models/Salesforce-BLIP-image-captioning-base/version/2

License: mit

Properties

SHA: 89b09ea1789f7addf2f6d6f0dfc4ce10ab58ef84

inference-min-sku-spec: 2|0|7|14

inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

model_id: Salesforce/blip-image-captioning-base

Clone this wiki locally