feat: Media processing in the frontend - 1st pass #3630

milesial · 2025-10-15T00:31:32Z

(WIP)

Overview:

Media decoding in the frontend for VLMs.

Details:

Decodes multimodal data from the OAI chat request (image_url, video_url) in the frontend processor into decoded tensors (pixel values).
Passes the decoded data to the next step in the graph (backend) via NIXL readable descriptors.

Decoding data involves:

Potentially fetching the data from the web
Potentially decoding base64
Running the actual media decoding (JPEG, H264, ...)

These last two steps can be CPU-heavy and are done in the rayon runtime.
This decoding is optional, if dynamo was not built with this feature, or if no decoding configuration is passed, unprocessed URLs will be passed.

Preprocessor holds a MediaLoader, which has an HTTP client and media decoders for each modality. Decoder configuration is passed via the MDC. In the future, per-request or even per-item options could override this default configuration.

TODOs:

Have media decoding code under a feature flag
NIXL descriptors
Unit tests
Microbench tests
Per-request decoder options
HW decoding

Where should the reviewer start?

Flow starting from gather_multi_modal_data in preprocessor.rs

Signed-off-by: Alexandre Milesi <[email protected]>

copy-pr-bot · 2025-10-15T00:31:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

feat: Media processing in the frontend - 1st pass

6df1e40

Signed-off-by: Alexandre Milesi <[email protected]>

pull-request-size bot added the size/XL label Oct 15, 2025

github-actions bot added the feat label Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Media processing in the frontend - 1st pass #3630

feat: Media processing in the frontend - 1st pass #3630

Uh oh!

milesial commented Oct 15, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Media processing in the frontend - 1st pass #3630

Are you sure you want to change the base?

feat: Media processing in the frontend - 1st pass #3630

Uh oh!

Conversation

milesial commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

(WIP)

Overview:

Details:

Where should the reviewer start?

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

milesial commented Oct 15, 2025 •

edited

Loading