-
Notifications
You must be signed in to change notification settings - Fork 350
Description
TLDR: i made a prototype prompt generator for ACE-Step which take in a song and output some tags and lyrics
recently i trained a music understanding model(audio-text-to-text) called MuFun and try to find some applications. in T2I i notice there're tools that take in a picture and output prompt which can probably generate it back, so i wonder if it's possible to make one for a T2M model, then i collect some synthesis data(that is, generated by ACE-Step) and made a finetune. looking forward to feedbacks
model weights: https://huggingface.co/Yi3852/MuFun-ACEStep
demo: http://47.121.209.64/mufun_demo_acestep
(this is run on my pc so may be very slow, maybe i could make some quant version later)
(update: using bitsandbytes 4bit quant to load model now, generate in 30s for a song)
(update: gradio demo code is at https://github.com/laitselec/MuFun/blob/main/demo/mufun_acestep/gr_app.py
the synthesis data is at https://huggingface.co/datasets/Yi3852/ACEStep-Songs which has 21k samples)