- Jaykrishnan Gopalakrishna Pillai
- Filip Danielsson
- Filip Koňařík
Using MLOps procedures to develop a model to generate a sequence of frame of someone playing the piano.
For the piano image generation we will test different generative models such as denoising diffusion or GANs (generative adversarial networks). An example of using diffusion models for conditional image generation: https://github.com/TeaPearce/Conditional_Diffusion_MNIST
We will extract training data from publicly available videos using https://github.com/uel/BP. The generative model can be conditioned on different types of data extracted by the library for example hand placement, played notes, key location, previous frame.
No existing models for this specific task exist at the moment, we will therefore train a generative model from scratch. The data extraction step uses deep learning models for hand landmarking, keyboard detection and piano transcription.
Code coverage: 51%