Conversation
| async def async_init(self) -> None: | ||
| """Asynchronously initialize the embedding models.""" | ||
| # Run blocking model setup in a background task using a thread | ||
| self.mass.create_task(asyncio.to_thread(self._setup_models)) |
There was a problem hiding this comment.
store the task in a variable if you want to cancel it on unload
| # waveform = None | ||
| # sample_rate = None |
There was a problem hiding this comment.
If you need help with this part, ping me on discord. Its relatively easy to get the audio stream in pcm.
* feat: add recommendation logic * Add Music Insights optional dependencies (#3)
|
@ztripez any more progress on this one? |
Hey @OzGav - I have some bandwidth to pick this back up, but I need to address a fundamental architectural issue before moving forward. The Problem:
Proposed Solution:
Questions:
If the sidecar approach is acceptable, I can get moving on this. If you strongly prefer it as an integrated provider, we need to discuss how to handle the dependency bloat - maybe optional extras in requirements with clear documentation about the 3GB+ install size? My personal take: The sidecar is objectively the right architecture here. We would be bolting ML/GPU workloads onto a music server, that screams "separate service." |
|
I think it makes sense to have the analysis part as a separate sidecar. We can then implement a thin The only thing I am unsure about is that we will need to stream raw PCM to that sidecar, which might be a bit bandwidth heavy. Looping in @marcelveldt as he will definitely have some ideas for this. Question back: I did some work on audio analysis for smart fades already (simple beat/downbeat analysis). Could your libraries possibly enhance this information as well? (think phrase detection, key detection etc.) |
|
Thanks @MarvinSchenkel! Glad the sidecar approach makes sense. On streaming bandwidth: The current implementation streams raw PCM frames over HTTP/msgpack, but the sidecar is designed to handle this efficiently:
That said, I'm open to moving mel spectrogram computation to MA's side if bandwidth becomes an issue - mel features are ~64x256 floats (~65KB) for a 10-second window vs ~1.8MB of raw PCM. That's a significant reduction. On audio analysis expansion: The sidecar is designed to be modular. I'm already doing zero-shot mood classification using CLAP's joint embedding space (energetic, melancholic, aggressive, etc.) during ingestion. For your use cases:
The architecture already supports this - the watcher module can decode full audio files using symphonia (mp3, flac, ogg, m4a, etc.), resample, and run multiple analysis passes. Adding new feature extractors would be straightforward. Bonus: Local file scanning I'm also building a folder watcher module that runs alongside the sidecar (sidecar² lol) - it monitors local music directories, decodes files directly, extracts ID3/Vorbis metadata, and generates embeddings. This could be useful for:
Happy to collaborate on the audio analysis expansion if there's interest. |
|
another thing is to move to a sqlite fork like turso with vector capabilities, then the sidecar can be a processor and stateless and just return the embeddings and all queries can run in MA. |
Description
This PR introduces a new provider, Music Insights, designed to enhance Music Assistant with features based on audio embeddings and user interaction analysis. It leverages ChromaDB for vector storage and CLAP models (via the
transformerslibrary) for generating embeddings.Current Features (Work-in-Progress):
InsightScrobbler. Data is stored in a separate ChromaDB collection.TODOs:
How to Test:
music_insightsprovider in the MA settings.This provider is still under active development, but this initial version lays the foundation for music discovery and recommendation features within Music Assistant.