Hi,
First of all, congrats and really great work! While there are lots of audio examples, I haven't found any examples with videos so it is hard to tell. Since you have compared with RegNet which claimed to generate Visually Aligned Sound from Videos, I am just curious whether this work can also achieve that. Thank you.