-
Notifications
You must be signed in to change notification settings - Fork 143
models Salesforce BLIP image captioning base
The BLIP
framework is a new Vision-Language Pre-training (VLP) framework that can be used for both vision-language understanding and generation tasks. BLIP effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. This framework achieves state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval, image captioning, and VQA. BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. The code, models, and datasets are available for use. Researchers should carefully assess the safety and fairness of the model before deploying it in any real-world applications.
The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | image-to-text-online-endpoint.ipynb | image-to-text-online-endpoint.sh |
Batch | image-to-text-batch-endpoint.ipynb | image-to-text-batch-endpoint.sh |
{
"input_data":{
"columns":[
"image"
],
"index":[0, 1],
"data":[
["image1"],
["image2"]
]
}
}
Note:
- "image1" and "image2" should be publicly accessible urls or strings in
base64
format.
[
{
"text": "a box of food sitting on top of a table"
},
{
"text": "a stream in the middle of a forest"
}
]
Version: 2
Preview
license : mit
task : image-to-text
View in Studio: https://ml.azure.com/registries/azureml/models/Salesforce-BLIP-image-captioning-base/version/2
License: mit
SHA: 89b09ea1789f7addf2f6d6f0dfc4ce10ab58ef84
inference-min-sku-spec: 2|0|7|14
inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
model_id: Salesforce/blip-image-captioning-base