The basic idea of the Audio Grammar is derived from the that of the Image Grammar; please read that first. The general idea is that one can create collections of audio processing filters, and then train to discover high mutual-information filter pairs, use these to obtain maximum spanning graphs, and, from these, to obtain grammars. The same ideas as applied to natural language, but instead applied to sound.
Version 0.0.0 -- There no code, and barely even an idea. This README exists only to collect interesting bibliographical material and useful URL's.
-
Aaron Keesing, Yun Sing Koh , Michael Witbrock, "Acoustic Features and Neural Representations for Categorical Emotion Recognition from Speech" INTERSPEECH 2021 30 August – 3 September, 2021, Brno, Czechia pp 3415-3419 http://dx.doi.org/10.21437/Interspeech.2021-2217
Provides a useful, short review of speech processing.