Skip to content

Gemma3n multimodal(image,audio) support for Android and web #6024

Open
@bil-ash

Description

@bil-ash

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Android, Web

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

ASR, image recognition

Programming Language and version (e.g. C++, Python, Java)

Java, js

Describe the actual behavior

Mediapipe is able to accept only text and image input on Android and only text input on web.

Describe the expected behaviour

Mediapipe is able to accept text,image and audio input on both Android and web

Standalone code/steps you may have used to try to get what you need

Followed the official docs

Other info / Complete Logs

Metadata

Metadata

Assignees

Labels

platform::androidAndroid Solutionsplatform:javascriptMediaPipe Javascript issuesstat:awaiting googlerWaiting for Google Engineer's Responsetype:featureEnhancement in the New Functionality or Request for a New Solution

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions