Calculates and visualizes the temporal domain and frequency domain mean squared error of ffmpeg audio filters.
- Install ffmpeg
- Install required Python packages (
pip3 install -r requirements.txt) - Create
original_audiodirectory - Put audio in
original_audiodirectory
- Run
convert.shif audio files are not in .wav format - Run
apply_filters.pyto create filtered audio files - Run
mse.pyto calculate the mean square error for each filter - Run
visualize.pyto visualize the results as bar graphs
Converts any audio files in original_audio into a .wav file using default ffmpeg conversion and deletes originals.
./convert.sh
JSON formatted list of ffmpeg audio filters
Configuration file for apply_filters.py and mse.py
| key | default | description |
|---|---|---|
| filters_filename | filters.json | filename of JSON formatted filters list in |
| segment_len | 262144 | number of audio samples in each analyzed segment |
| sample_skips | 262144 | number of samples skipped between beginnings of analyzed segments |
| bit_depth | 16 | bit depth of analyzed audio |
| original_audio_dir | original_audio | relative path to search for original audio |
| filtered_audio_dir | filtered_audio | relative path of filtered audio |
| output_filename | output.json | filename of JSON formatted mean square error output |
Defines CONFIG_FILENAME, Config class, and associated JSON loader function (load_config).
Loads configuration from CONFIG_FILENAME, applies list of ffmpeg audio filters from filters_filename to .wav files in original_audio_dir and writes resulting audio files to filtered_audio_dir.
python3 apply_filters.py
Loads configuration from CONFIG_FILENAME and calculates the average MSE of sequences of length sequence_len in the temporal domain and frequency domain (DCT-II) between original audio segments and their filtered counterparts. Resulting MSEs are dumped to output_filename in JSON format.
python3 mse.py
Loads configuration from CONFIG_FILENAME, read MSE outputs from output_filename and plot the results as bar graphs.
The following results were calculated from 3 hours of audio extracted from a Twitch VOD.
| filter | MSE (temporal domain) | MSE (frequency domain) |
|---|---|---|
| acompressor | 419.5447047722049 | 769325.3616135248 |
| acrusher | 128.31195087665463 | 788.4744883700115 |
| aecho | 1973.808181613829 | 11476890.952585308 |
| aphaser | 2140.157159476164 | 7514830.79328153 |
| alimiter | 1589.4807644937096 | 33103402.4035865 |
- Add more audio filters
- Add better documentation for example results
.png?raw=true)
.png?raw=true)