Skip to content

Latest commit

 

History

History
63 lines (27 loc) · 2.04 KB

README.md

File metadata and controls

63 lines (27 loc) · 2.04 KB

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music (ACM MM 2020 BEST PAPER)

https://dl.acm.org/doi/pdf/10.1145/3394171.3414032 or https://arxiv.org/abs/2010.08091

For citation:

    @inproceedings{        

                            liang2020pirhdy,        

                            title={PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music},                        

                            author={Liang, Hongru and Lei, Wenqiang and Chan, Paul Yaozhu and Yang, Zhenglu and Sun, Maosong and Chua, Tat-Seng},                       

                            booktitle={Proceedings of the 28th ACM International Conference on Multimedia},                       

                            pages={574--582},                      

                            year={2020}                       

                   }

*We suggest you to generate all datasets by yourself, as the datasets are too huge to deliver. *

Any further question, pls email [email protected] (first author) or [email protected] (corresponding author).

step 1: normalize original midi files: time normalization, key tranformation, etc.

step 2: transform midi files into time-pitch matrices

step 3: analysis chord in midi file: not necessary to re-run the files, all needed files already in this dir

step 4: transform matrices into quadruple sequences: (chroma, octave, velocity, state), the final format

step 5:

    1) generate datasets for token modeling dataset 
    
    2) token modeling
       **pre-trained models are in pre-trained-models**

step 6:

    1) transform sequence to bars         
    
    2) transform bars into phrases        
    
    3) generate dataset for context modeling         
    
    4) context modeling and downstream tasks
       **embeddings pre-trained through token modeling are in "embeddings", models fine-tuned by context modeling are in "pre-trained models".**