Add Qwen3 Audio Encoder #2952

aireenmei · 2026-01-15T00:45:47Z

Description

Original author @eitanporat in #2726

Revise according to the original PR comments:

rename flags for consistency and clean up usage
make AudioEncoder an NNX module
remove SinusoidsPositionEmbedding, replaced with expanded existing PositionalEmbedding module
remove Qwen3OmniAudioModel in qwen3.py since we use the AudioEncoder class in decoder.py instead
add precision related flags

Tests

tests/check_qwen3_omni_audio_vs_reference.py all pass

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-15T00:59:06Z

Codecov Report

❌ Patch coverage is 19.83240% with 287 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/multimodal_utils.py	11.66%	155 Missing and 4 partials ⚠️
src/MaxText/layers/qwen3.py	30.61%	68 Missing ⚠️
src/MaxText/layers/encoders.py	19.23%	21 Missing ⚠️
src/MaxText/layers/embeddings.py	18.18%	18 Missing ⚠️
src/MaxText/layers/models.py	23.52%	11 Missing and 2 partials ⚠️
src/MaxText/layers/decoders.py	0.00%	4 Missing and 1 partial ⚠️
src/MaxText/maxtext_utils.py	62.50%	2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

hengtaoguo · 2026-01-15T23:17:35Z

src/MaxText/layers/encoders.py

    return embeddings


+class AudioEncoder(nnx.Module):


Thanks for implementing it as NNX!

hengtaoguo · 2026-01-15T23:41:48Z

src/MaxText/layers/decoders.py


+    if audio_embeddings is not None and cfg.use_audio:
+      if cfg.model_name in ["qwen3-omni-30b-a3b"]:
+        y = multimodal_utils.merge_mm_embeddings(


Does it support "video/audio interleaving" at the moment? Or it's in following PRs?

hengtaoguo · 2026-01-16T00:32:34Z

src/MaxText/layers/models.py

        image_embeddings=image_embeddings,
        image_masks=encoder_image_masks,
+        audio_embeddings=audio_embeddings,
+        audio_masks=audio_masks,


I feel like we can consolidate all multimodal inputs into a class and pass this single variable to the model. But that can be a follow up later.

+1 to this!

hengtaoguo · 2026-01-16T00:34:49Z

src/MaxText/layers/qwen3.py

+  config: Config
+  proj1: DenseGeneral
+  proj2: DenseGeneral


Are these three lines necessary?

hengtaoguo · 2026-01-16T19:04:32Z

src/MaxText/layers/qwen3.py

+      self,
+      hidden_states: Array,
+      deterministic: bool = False,
+  ):


Can we also add an input/output shape comment here?

hengtaoguo · 2026-01-16T19:07:05Z

tests/check_qwen3_omni_audio_vs_reference.py

Would you prefer to create this standalone check, or merge it with https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/check_qwen3_embedding_vs_reference.py?

hengtaoguo · 2026-01-16T19:08:12Z

Thank you for the great work!

NicoGrande

LGTM

NicoGrande · 2026-01-16T23:32:13Z

src/MaxText/layers/models.py

        image_embeddings=image_embeddings,
        image_masks=encoder_image_masks,
+        audio_embeddings=audio_embeddings,
+        audio_masks=audio_masks,


+1 to this!

aireenmei force-pushed the aireen/qwen-audio branch 2 times, most recently from 580f4a2 to 875e674 Compare January 15, 2026 02:54

eitanporat and others added 4 commits January 15, 2026 19:22

Add audio encoder

9b2eb7b

revise

9e0d22e

fix pylint

ff83b29

fix test

c8e2165

aireenmei force-pushed the aireen/qwen-audio branch from 9bbf2b0 to ccb87ed Compare January 15, 2026 19:23

small updates

e4d6942

aireenmei force-pushed the aireen/qwen-audio branch from ccb87ed to e4d6942 Compare January 15, 2026 19:29

aireenmei marked this pull request as ready for review January 15, 2026 19:30

aireenmei requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners January 15, 2026 19:30

hengtaoguo approved these changes Jan 16, 2026

View reviewed changes

NicoGrande approved these changes Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen3 Audio Encoder #2952

Add Qwen3 Audio Encoder #2952

Uh oh!

aireenmei commented Jan 15, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

hengtaoguo Jan 15, 2026

Uh oh!

hengtaoguo Jan 15, 2026

Uh oh!

hengtaoguo Jan 16, 2026

Uh oh!

NicoGrande Jan 16, 2026

Uh oh!

hengtaoguo Jan 16, 2026

Uh oh!

hengtaoguo Jan 16, 2026

Uh oh!

hengtaoguo Jan 16, 2026

Uh oh!

hengtaoguo commented Jan 16, 2026

Uh oh!

NicoGrande left a comment

Uh oh!

NicoGrande Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Qwen3 Audio Encoder #2952

Are you sure you want to change the base?

Add Qwen3 Audio Encoder #2952

Uh oh!

Conversation

aireenmei commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hengtaoguo commented Jan 16, 2026

Uh oh!

NicoGrande left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aireenmei commented Jan 15, 2026 •

edited

Loading

codecov bot commented Jan 15, 2026 •

edited

Loading