SAM2 Video support fp16 #43268

Guppy16 · 2026-01-13T23:21:21Z

What does this PR do?

Fix SAM2 Video inference processor so that it can support float16 (currently just works for fp32 and bfloat16).

How to reproduce

Demo source from here

This demo will work for: dtype = torch.bfloat16 and dtype = torch.float32,
and this PR fixes it for the case: dtype = torch.float16
(pls note that fp8 / int8 / etc don't work)

import torch
from transformers import Sam2VideoModel, Sam2VideoProcessor
from transformers.video_utils import load_video

device = torch.device("cuda")
dtype = torch.float16
model_name = "facebook/sam2.1-hiera-tiny"

model = Sam2VideoModel.from_pretrained(model_name).to(device, dtype=dtype)

processor = Sam2VideoProcessor.from_pretrained(model_name)


video_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/bedroom.mp4"

video_frames, _ = load_video(video_url)

# Initialize session for streaming
inference_session = processor.init_video_session(
    inference_device=device,
    dtype=dtype,
)

# Process frames one by one
for frame_idx, frame in enumerate(video_frames[:10]):  # Process first 10 frames
    inputs = processor(images=frame, device=device, return_tensors="pt")
    if frame_idx == 0:
        # Add point input on first frame
        processor.add_inputs_to_inference_session(
            inference_session=inference_session,
            frame_idx=0,
            obj_ids=1,
            input_points=[[[[210, 350], [250, 220]]]],
            input_labels=[[[1, 1]]],
            original_size=inputs.original_sizes[
                0
            ],  # need to be provided when using streaming video inference
        )
    # Process current frame
    sam2_video_output = model(
        inference_session=inference_session, frame=inputs.pixel_values[0]
    )
    video_res_masks = processor.post_process_masks(
        [sam2_video_output.pred_masks],
        original_sizes=inputs.original_sizes,
        binarize=False,
    )[0]
    print(f"Frame {frame_idx}: mask shape {video_res_masks.shape}")

Who can review?

@yonigozlan @molbap

github-actions · 2026-01-13T23:22:33Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: sam2_video

src/transformers/models/sam2_video/modular_sam2_video.py

Guppy16 · 2026-01-16T23:23:24Z

@yonigozlan bump. there are a few cicd pipelines which are failing; smth to do with code quality and consistency between sam 2 and sam 3 (but not sure entirely)

fix: cast memory attention inputs to inference session dtype

2f97225

Guppy16 commented Jan 13, 2026

View reviewed changes

src/transformers/models/sam2_video/modular_sam2_video.py Outdated Show resolved Hide resolved

chore: fix formatting

058811d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SAM2 Video support fp16 #43268

SAM2 Video support fp16 #43268

Guppy16 commented Jan 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

Uh oh!

Guppy16 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SAM2 Video support fp16 #43268

Are you sure you want to change the base?

SAM2 Video support fp16 #43268

Conversation

Guppy16 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to reproduce

Who can review?

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

Uh oh!

Guppy16 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Guppy16 commented Jan 13, 2026 •

edited

Loading