Skip to content

[Proposal] Contribute to nanoALM – Add Audio Modality Support #99

@haidog-yaqub

Description

@haidog-yaqub

Hi nanoVLM team,

Thank you for your excellent work on nanoVLM – it's a very impressive project.

I’m currently working on audio-language models, and one thing I’ve noticed is the lack of simple, well-structured pipelines for building such models. I believe nanoALM, with its clean and modular design, could serve as a strong foundation for supporting the audio modality in a lightweight and extensible way.

I would love to contribute by extending the framework to include audio modality support, leveraging the existing strengths of the nanoVLM architecture.

Please let me know if you'd be open to this contribution — or potentially collaborating on a new repository focused on lightweight audio-language modeling.

Activity

lusxvr

lusxvr commented on Jun 4, 2025

@lusxvr
Member

Exciting! This would definitely be an interesting addition, we are very interested in so-called Omni-Models, that can take multiple different modalities as in- and output. Do you have a specific idea in mind on how to integrate Audio into nanoVLM? I would suggest we brainstorm and plan a bit before jumping into the code.

tsdocode

tsdocode commented on Jun 5, 2025

@tsdocode

Interested

carankt

carankt commented on Jun 12, 2025

@carankt

I have an initial implementation of the nanoVLM adapted for the audio domain. I was able to use the CLAP model as the feature extractor and make it work. Would love to be a part of the discussion and brainstorm.

aceliuchanghong

aceliuchanghong commented on Jun 20, 2025

@aceliuchanghong

Interested +1

leo1oel

leo1oel commented on Jun 24, 2025

@leo1oel

Interesting

WangHaoyuuu

WangHaoyuuu commented on Jun 25, 2025

@WangHaoyuuu

Interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @carankt@aceliuchanghong@tsdocode@haidog-yaqub@lusxvr

        Issue actions

          [Proposal] Contribute to nanoALM – Add Audio Modality Support · Issue #99 · huggingface/nanoVLM