Skip to content

lottev1991/grimesai-svs-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

GrimesAI HTK-style SVS Labels

This repository contains HTK-style .lab files, to be used on dry stems from the GrimesAI dataset. This to facilitate them for training SVS (singing voice synthesis) AI models.

You can download the audio dataset on elf.tech (you need to make an account first).

License note

The license included in this repository applies ONLY to the .lab files, NOT the original audio dataset, which was released under Grimes' own terms of use (more information can be found here).

General information

Canadian singer-songwriter Grimes has authorized her voice and music to be used for the training of AI models. The dataset that she released includes both wet and dry vocal stems, as well as instrument tracks; a selection of the dry vocal stems have been labeled with phonetic transcriptions and phoneme duration information by yours truly. Stems that were deemed unsuitable were left out. The filenames of the labels correspond to the audio files found in the original dataset. Stems from the songs "IDORU" and "Delicate Weapon" have been used.

Phonetic system

The phonemes are based on ARPABET, including the following extended phonemes:

  • ax (schwa);
  • dx (tap);
  • em (syllabic m);
  • en (syllabic n).

It also incorporates the following non-ARPABET phonemes:

  • GS (glottal stop);
  • EP (exhale);
  • vf (vocal fry/creaky voice).

The labels use SP as its pause phoneme, and AP as its breath (inhale) phoneme. These are considered global/required.

Notes on phonemes

  • I am aware that the extended ARPABET uses q for its glottal stop phoneme. However, GS was used instead, in order to prevent conflicts with phonetic transcription systems used in other languages that may use q for a different sound (e.g. Mandarin Chinese, Arabic).
  • Some phonemes are not present in the labels, since they were also not present in the audio data. One example is the phoneme oy. Therefore it is HIGHLY ADVISED to train the data in parallel with another public dataset! An example of a public English corpus you could use for this purpose is my own Project AI❤dol Public English Dataset, which has a full phonetic inventory.

A note on commercial use

Any commercial use of models that incorporate these labels requires a 50/50 split of royalties between you and Grimes. This is in accordance with her original terms (see above, under "License note").

Releases

No releases published

Packages

No packages published