-
Notifications
You must be signed in to change notification settings - Fork 356
Open
Description
Hi nanoVLM team,
Thank you for your excellent work on nanoVLM – it's a very impressive project.
I’m currently working on audio-language models, and one thing I’ve noticed is the lack of simple, well-structured pipelines for building such models. I believe nanoALM, with its clean and modular design, could serve as a strong foundation for supporting the audio modality in a lightweight and extensible way.
I would love to contribute by extending the framework to include audio modality support, leveraging the existing strengths of the nanoVLM architecture.
Please let me know if you'd be open to this contribution — or potentially collaborating on a new repository focused on lightweight audio-language modeling.
tsdocode, andimarafioti, MaoSong2022, pdh930105, carankt and 3 more
Activity
lusxvr commentedon Jun 4, 2025
Exciting! This would definitely be an interesting addition, we are very interested in so-called Omni-Models, that can take multiple different modalities as in- and output. Do you have a specific idea in mind on how to integrate Audio into nanoVLM? I would suggest we brainstorm and plan a bit before jumping into the code.
tsdocode commentedon Jun 5, 2025
Interested
carankt commentedon Jun 12, 2025
I have an initial implementation of the nanoVLM adapted for the audio domain. I was able to use the CLAP model as the feature extractor and make it work. Would love to be a part of the discussion and brainstorm.
aceliuchanghong commentedon Jun 20, 2025
Interested +1
leo1oel commentedon Jun 24, 2025
Interesting
WangHaoyuuu commentedon Jun 25, 2025
Interesting