models facebook sam vit base

facebook-sam-vit-base

Overview

The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.

The SAM model is made up of 3 modules:

The VisionEncoder: a VIT based image encoder. It computes the image embeddings using attention on patches of the image. Relative Positional Embedding is used.
The PromptEncoder: generates embeddings for points and bounding boxes
The MaskDecoder: a two-ways transformer which performs cross attention between the image embedding and the point embeddings (->) and between the point embeddings and the image embeddings. The outputs are fed
The Neck: predicts the output masks based on the contextualized masks produced by the MaskDecoder.

Training Details

Training Data

See here for an overview of the datastet.

License

apache-2.0

Inference Samples

Inference type	Python sample (Notebook)	CLI with YAML
Real time	mask-generation-online-endpoint.ipynb	mask-generation-online-endpoint.sh
Batch	mask-generation-batch-endpoint.ipynb	mask-generation-batch-endpoint.sh

Sample input and output

Sample input

{
  "input_data": {
    "columns": [
      "image",
      "input_points",
      "input_boxes",
      "input_labels",
      "multimask_output"
    ],
    "index": [0],
    "data": [["image1", "", "[[650, 900, 1000, 1250]]", "", false]]
  },
  "params": {}
}

Note: "image1" string should be in base64 format or publicly accessible urls.

Sample output

[
    {
        "predictions": [
          0: {
            "mask_per_prediction": [
              0: {
                "encoded_binary_mask": "encoded_binary_mask1",
                "iou_score": 0.85
              }
            ]
          }
        ]
    },
]

Note: "encoded_binary_mask1" string is in base64 format.

Visualization of inference result for a sample image

Version: 6

Tags

author : Meta huggingface_model_id : facebook/sam-vit-base license : apache-2.0 task : image-segmentation training_dataset : SA-1B hiddenlayerscanned SharedComputeCapacityEnabled inference_compute_allow_list : ['Standard_DS5_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_FX4mds', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']

View in Studio: https://ml.azure.com/registries/azureml/models/facebook-sam-vit-base/version/6

License: apache-2.0

Properties

SharedComputeCapacityEnabled: True

SHA: b5fc59950038394bae73f549a55a9b46bc6f3d96

inference-min-sku-spec: 4|0|32|64

inference-recommended-sku: Standard_DS5_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

models facebook sam vit base

facebook-sam-vit-base

Overview

Training Details

Training Data

License

Inference Samples

Sample input and output

Sample input

Sample output

Visualization of inference result for a sample image

Tags

Properties

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!