Music Insights Provider by ztripez · Pull Request #2153 · music-assistant/server

ztripez · 2025-04-27T21:42:38Z

Description

This PR introduces a new provider, Music Insights, designed to enhance Music Assistant with features based on audio embeddings and user interaction analysis. It leverages ChromaDB for vector storage and CLAP models (via the transformers library) for generating embeddings.

Current Features (Work-in-Progress):

Provider Setup: Basic configuration flow with presets for different hardware capabilities (CPU/GPU).
ChromaDB Integration: Sets up a persistent ChromaDB client within the MA data directory.
Text Embeddings: Generates text embeddings for tracks based on metadata (genre, artist, title, album, mood).
Semantic Search: Allows searching for tracks using natural language queries.
Similar Tracks: Finds tracks similar to a given track based on text embedding similarity
User Interaction Tracking: Records basic track playback events (start, progress, scrobble) using a dedicated InsightScrobbler. Data is stored in a separate ChromaDB collection.
Library Sync: Automatically updates embeddings when tracks are added, updated, or deleted from the library.
Configuration Handling: Rebuilds embeddings if relevant configuration (model name, window size) changes.

TODOs:

[ ] Audio Embeddings: Currently only text embeddings are generated and used.
[ ] Recommendations The core logic to analyze user interactions and generate personalized recommendations based on embeddings needs implementation.

How to Test:

Enable the music_insights provider in the MA settings.
Choose a preset (or configure manually). Note that the first startup might take time to download the embedding model.
Allow the initial embedding process to run (check logs for progress - currently only logs start/finish/errors).
Use the search function with descriptive terms (e.g., "upbeat electronic music", "sad acoustic song").
View a track and check the "Similar Tracks" section.
Play some tracks and observe logs for interaction recording messages (debug level).

This provider is still under active development, but this initial version lays the foundation for music discovery and recommendation features within Music Assistant.

marcelveldt · 2025-04-28T10:53:03Z

music_assistant/providers/music_insights/chroma_embeddings.py

+    async def async_init(self) -> None:
+        """Asynchronously initialize the embedding models."""
+        # Run blocking model setup in a background task using a thread
+        self.mass.create_task(asyncio.to_thread(self._setup_models))


store the task in a variable if you want to cancel it on unload

marcelveldt · 2025-04-28T10:55:07Z

music_assistant/providers/music_insights/chroma_embeddings.py

+        # waveform = None
+        # sample_rate = None


If you need help with this part, ping me on discord. Its relatively easy to get the audio stream in pcm.

* feat: add recommendation logic * Add Music Insights optional dependencies (#3)

OzGav · 2025-09-09T22:27:37Z

@ztripez any more progress on this one?

ztripez · 2025-12-23T14:43:46Z

@ztripez any more progress on this one?

Hey @OzGav - I have some bandwidth to pick this back up, but I need to address a fundamental architectural issue before moving forward.

The Problem:
The current approach of embedding CLAP/transformers directly into MA creates a dependency nightmare:

PyTorch + CUDA support = 2-3GB+ download
CUDA version compatibility hell (PyTorch vs system CUDA vs transformers)
Forces GPU dependencies on ALL MA users, even those who never use this feature
Different hardware needs different builds (CPU/CUDA/ROCm)
Makes MA installation significantly heavier

Proposed Solution:
I'm thinking this should be a separate sidecar service that MA communicates with via HTTP/gRPC. Benefits:

Optional - only users who want AI features install it
Clean separation of GPU/ML dependencies
Can be containerized independently with proper CUDA base images
Easier to iterate on models without touching MA core
Users can run it on different hardware (GPU server separate from MA)

Questions:

Is there precedent for optional sidecar services in the MA ecosystem?
Would you accept this as a separate service that MA integrates with, rather than a built-in provider?
Any preferences on communication protocol (REST/gRPC)?

If the sidecar approach is acceptable, I can get moving on this. If you strongly prefer it as an integrated provider, we need to discuss how to handle the dependency bloat - maybe optional extras in requirements with clear documentation about the 3GB+ install size?

My personal take: The sidecar is objectively the right architecture here. We would be bolting ML/GPU workloads onto a music server, that screams "separate service."

MarvinSchenkel · 2025-12-28T09:27:34Z

I think it makes sense to have the analysis part as a separate sidecar. We can then implement a thin MetadataPlugin that can obtain the results and store it in MA alongside the MediaItems. We could eventually host that Docker file alongside our other MA addons and containers

The only thing I am unsure about is that we will need to stream raw PCM to that sidecar, which might be a bit bandwidth heavy.

Looping in @marcelveldt as he will definitely have some ideas for this.

Question back: I did some work on audio analysis for smart fades already (simple beat/downbeat analysis). Could your libraries possibly enhance this information as well? (think phrase detection, key detection etc.)

ztripez · 2025-12-28T12:01:25Z

Thanks @MarvinSchenkel! Glad the sidecar approach makes sense.
I've started to with the sidecar over here: (don't read to much into the README.md it have drifted a lot from what the api actually look like) https://github.com/ztripez/music-assistant-insights

On streaming bandwidth:

The current implementation streams raw PCM frames over HTTP/msgpack, but the sidecar is designed to handle this efficiently:

Audio frames are buffered on MA's side (~1 second chunks, ~384KB at 48kHz stereo f32) to reduce HTTP overhead
The sidecar converts to mono, resamples to 48kHz if needed, and computes mel spectrograms incrementally during streaming (cheap STFT + filterbank operations)
Expensive model inference is deferred to session end, so streaming itself is lightweight
Sessions are keyed by track, with automatic cleanup of stale sessions

That said, I'm open to moving mel spectrogram computation to MA's side if bandwidth becomes an issue - mel features are ~64x256 floats (~65KB) for a 10-second window vs ~1.8MB of raw PCM. That's a significant reduction.

On audio analysis expansion:

The sidecar is designed to be modular. I'm already doing zero-shot mood classification using CLAP's joint embedding space (energetic, melancholic, aggressive, etc.) during ingestion.

For your use cases:

Beat/downbeat detection: Could integrate with existing beat detection models or use onset detection algorithms (librosa-style)
Key detection: Models like https://github.com/spotify/audio-features or lighter neural approaches exist
Phrase detection: More experimental, but possible with structural segmentation models

The architecture already supports this - the watcher module can decode full audio files using symphonia (mp3, flac, ogg, m4a, etc.), resample, and run multiple analysis passes. Adding new feature extractors would be straightforward.

Bonus: Local file scanning

I'm also building a folder watcher module that runs alongside the sidecar (sidecar² lol) - it monitors local music directories, decodes files directly, extracts ID3/Vorbis metadata, and generates embeddings. This could be useful for:

Users who want embeddings without MA integration
Pre-populating the vector DB before MA syncs
Analyzing tracks that MA doesn't have metadata for

Happy to collaborate on the audio analysis expansion if there's interest.

ztripez · 2025-12-28T13:48:37Z

another thing is to move to a sqlite fork like turso with vector capabilities, then the sidecar can be a processor and stateless and just return the embeddings and all queries can run in MA.

Ztripez von Matérn added 4 commits April 26, 2025 10:30

init insights

9ccd404

init insights

9aac438

presets

5d8c444

clean ups

076287c

marcelveldt reviewed Apr 28, 2025

View reviewed changes

ztripez added 3 commits June 14, 2025 15:15

Implement music insight recommendations (#2)

ba62843

* feat: add recommendation logic * Add Music Insights optional dependencies (#3)

Enhance recommendation scoring (#6)

eb10afd

Merge branch 'dev' into clap-plugin

17b8437

OzGav added the new-provider label Sep 9, 2025

Merge branch 'dev' into clap-plugin

ad412cd

merge fixes

8ff643c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Music Insights Provider#2153

Music Insights Provider#2153
ztripez wants to merge 9 commits intomusic-assistant:devfrom
ztripez:clap-plugin

ztripez commented Apr 27, 2025

Uh oh!

marcelveldt Apr 28, 2025

Uh oh!

marcelveldt Apr 28, 2025

Uh oh!

OzGav commented Sep 9, 2025

Uh oh!

ztripez commented Dec 23, 2025

Uh oh!

MarvinSchenkel commented Dec 28, 2025

Uh oh!

ztripez commented Dec 28, 2025

Uh oh!

ztripez commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ztripez commented Apr 27, 2025

Description

Uh oh!

marcelveldt Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

marcelveldt Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

OzGav commented Sep 9, 2025

Uh oh!

ztripez commented Dec 23, 2025

Uh oh!

MarvinSchenkel commented Dec 28, 2025

Uh oh!

ztripez commented Dec 28, 2025

Uh oh!

ztripez commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants