MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second music clip, MusicCaps provides:
-
A free-text caption consisting of four sentences on average, describing the music and
-
A list of music aspects, describing genre, mood, tempo, singer voices, instrumentation, dissonances, rhythm, etc.
conda create --name MusicCap python=3.9
conda activate MusicCap
pip install datasets yt-dlp pydub
For Windows: Download FFmpeg, add the path to the system's environment variables.
For macOS/Linux:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update
sudo apt install ffmpeg
- Log in YouTube
- Use a conforming browser extension to export cookies, such as Get cookies.txt LOCALLY and Cookie-Editor for Chrome, cookies.txt for Firefox.
- Copy and save the cookie (
Netscapeformat) to the local filecookies.txt
python Download.py