|
| 1 | +# TensorFlow Lite Keyword Spotting |
| 2 | +Native C/C++. Suitable for embedded devices. |
| 3 | + |
| 4 | + ~$ git clone --recursive --depth 1 https://github.com/42io/tflite_kws.git |
| 5 | + |
| 6 | +### Inference |
| 7 | +Default models pre-trained on 0-9 words: zero one two three four five six seven eight nine. |
| 8 | + |
| 9 | + ~$ arecord -f S16_LE -c1 -r16000 -d1 test.wav |
| 10 | + ~$ aplay test.wav |
| 11 | + ~$ dataset/dataset/google_speech_commands/src/features/build.sh |
| 12 | + ~$ src/brain/build.sh |
| 13 | + ~$ alias fe=dataset/dataset/google_speech_commands/bin/fe |
| 14 | + ~$ fe test.wav | bin/guess models/mlp.tflite |
| 15 | + ~$ fe test.wav | bin/guess models/cnn.tflite |
| 16 | + ~$ fe test.wav | bin/guess models/rnn.tflite |
| 17 | + ~$ fe test.wav | bin/guess models/dcnn.tflite |
| 18 | + ~$ fe test.wav | head -48 | tail -47 | bin/guess models/dcnn47.tflite |
| 19 | + |
| 20 | +### Real Time |
| 21 | +Microphone quality is very important. You should probably think about how to remove fan noise from the mic... Using headset seems like a good idea :) |
| 22 | + |
| 23 | + ~$ argmax() { mawk -Winteractive '{m=$1;j=1;for(i=j;i<=NF;i++)if($i>m){m=$i;j=i;}print j-1}'; } |
| 24 | + ~$ stable() { mawk -Winteractive -v u=$1 '{if(x!=$1){c=0;x=$1}else if(++c==u&&y!=x)print y=x}'; } |
| 25 | + ~$ ignore() { mawk -Winteractive -v t=$1 '{if($1<t)print $1}'; } |
| 26 | + |
| 27 | +Simple non-streaming mode. Model receives the whole input sequence and then returns the classification result: |
| 28 | + |
| 29 | + ~$ arecord -f S16_LE -c1 -r16000 -t raw | fe | \ |
| 30 | + bin/ring 47 | bin/guess models/dcnn47.tflite | argmax | stable 10 | ignore 10 |
| 31 | + |
| 32 | +[Streaming](https://arxiv.org/abs/2005.06720) mode is more CPU friendly as it reduces MAC operations in neural |
| 33 | +network. Model receives portion of the input sequence and classifies it incrementally: |
| 34 | + |
| 35 | + ~$ arecord -f S16_LE -c1 -r16000 -t raw | fe | \ |
| 36 | + bin/guess models/dcnn13.tflite | argmax | stable 10 | ignore 10 |
| 37 | + |
| 38 | +### Training |
| 39 | +Jupyter Notebooks [MLP](jupyter/mlp.ipynb) | [CNN](jupyter/cnn.ipynb) | [RNN](jupyter/rnn.ipynb) | [DCNN](jupyter/dcnn.ipynb) | [DCNN47](jupyter/dcnn47.ipynb) | [DCNN13](jupyter/dcnn13.ipynb) | [EDCNN47](jupyter/edcnn47.ipynb) | [ECNN47](jupyter/ecnn47.ipynb). |
| 40 | + |
| 41 | +Each notebook generates model file. To evaluate model accuracy: |
| 42 | + |
| 43 | + ~$ apt install gcc lrzip wget |
| 44 | + ~$ wget https://github.com/42io/dataset/releases/download/v1.0/0-9up.lrz -O /tmp/0-9up.lrz |
| 45 | + ~$ lrunzip /tmp/0-9up.lrz -o /tmp/0-9up.data # md5 87fc2460c7b6cd3dcca6807e9de78833 |
| 46 | + ~$ dataset/matrix.sh /tmp/0-9up.data |
| 47 | + |
| 48 | +Confusion matrix for pre-trained modeles: |
| 49 | + |
| 50 | + MLP confusion matrix... |
| 51 | + zero 0.93 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 | 603 |
| 52 | + one 0.00 0.85 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.05 0.06 0.01 | 575 |
| 53 | + two 0.03 0.00 0.86 0.02 0.02 0.00 0.00 0.01 0.01 0.00 0.04 0.01 | 564 |
| 54 | + three 0.00 0.00 0.01 0.90 0.00 0.01 0.01 0.01 0.04 0.01 0.01 0.01 | 548 |
| 55 | + four 0.00 0.01 0.01 0.00 0.90 0.01 0.00 0.00 0.00 0.00 0.05 0.01 | 605 |
| 56 | + five 0.00 0.01 0.00 0.01 0.01 0.80 0.01 0.03 0.01 0.03 0.09 0.01 | 607 |
| 57 | + six 0.00 0.00 0.00 0.00 0.00 0.00 0.96 0.00 0.00 0.00 0.02 0.01 | 462 |
| 58 | + seven 0.01 0.00 0.03 0.01 0.00 0.00 0.01 0.90 0.00 0.00 0.03 0.01 | 574 |
| 59 | + eight 0.00 0.00 0.01 0.07 0.00 0.00 0.03 0.00 0.84 0.01 0.03 0.01 | 547 |
| 60 | + nine 0.00 0.04 0.00 0.01 0.00 0.01 0.00 0.01 0.00 0.86 0.06 0.01 | 596 |
| 61 | + #unk# 0.02 0.03 0.03 0.05 0.06 0.07 0.02 0.03 0.02 0.07 0.58 0.02 | 730 |
| 62 | + #pub# 0.00 0.00 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.96 | 730 |
| 63 | + MLP guessed wrong 1029... |
| 64 | + |
| 65 | + CNN confusion matrix... |
| 66 | + zero 0.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 | 603 |
| 67 | + one 0.00 0.93 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.05 0.00 | 575 |
| 68 | + two 0.01 0.00 0.95 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.03 0.00 | 564 |
| 69 | + three 0.00 0.00 0.00 0.91 0.00 0.00 0.01 0.01 0.01 0.00 0.06 0.00 | 548 |
| 70 | + four 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.00 0.00 0.09 0.00 | 605 |
| 71 | + five 0.00 0.00 0.00 0.00 0.00 0.93 0.00 0.00 0.01 0.01 0.06 0.00 | 607 |
| 72 | + six 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.01 0.00 | 462 |
| 73 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.97 0.00 0.00 0.02 0.00 | 574 |
| 74 | + eight 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.00 0.93 0.00 0.03 0.00 | 547 |
| 75 | + nine 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.93 0.06 0.00 | 596 |
| 76 | + #unk# 0.01 0.01 0.00 0.02 0.02 0.00 0.00 0.00 0.00 0.01 0.92 0.01 | 730 |
| 77 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.98 | 730 |
| 78 | + CNN guessed wrong 427... |
| 79 | + |
| 80 | + RNN confusion matrix... |
| 81 | + zero 0.98 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 | 603 |
| 82 | + one 0.00 0.95 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.02 0.00 | 575 |
| 83 | + two 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 564 |
| 84 | + three 0.00 0.00 0.00 0.97 0.00 0.00 0.01 0.00 0.01 0.00 0.01 0.00 | 548 |
| 85 | + four 0.00 0.00 0.00 0.00 0.97 0.00 0.00 0.00 0.00 0.00 0.02 0.00 | 605 |
| 86 | + five 0.00 0.00 0.00 0.00 0.01 0.98 0.00 0.00 0.00 0.00 0.01 0.00 | 607 |
| 87 | + six 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 | 462 |
| 88 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.01 0.00 | 574 |
| 89 | + eight 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.97 0.00 0.01 0.00 | 547 |
| 90 | + nine 0.00 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.97 0.02 0.00 | 596 |
| 91 | + #unk# 0.00 0.01 0.00 0.01 0.02 0.02 0.00 0.00 0.01 0.02 0.91 0.00 | 730 |
| 92 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 | 730 |
| 93 | + RNN guessed wrong 220... |
| 94 | + |
| 95 | + DCNN confusion matrix... |
| 96 | + zero 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 | 603 |
| 97 | + one 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 | 575 |
| 98 | + two 0.01 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 564 |
| 99 | + three 0.00 0.00 0.00 0.97 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 | 548 |
| 100 | + four 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 605 |
| 101 | + five 0.00 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.01 0.00 | 607 |
| 102 | + six 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 | 462 |
| 103 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 | 574 |
| 104 | + eight 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.01 0.00 | 547 |
| 105 | + nine 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.01 0.00 | 596 |
| 106 | + #unk# 0.00 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.94 0.00 | 730 |
| 107 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 | 730 |
| 108 | + DCNN guessed wrong 143... |
| 109 | + |
| 110 | + DCNN47 confusion matrix... |
| 111 | + zero 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 | 603 |
| 112 | + one 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 | 575 |
| 113 | + two 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 | 564 |
| 114 | + three 0.00 0.00 0.01 0.97 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 548 |
| 115 | + four 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 605 |
| 116 | + five 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 | 607 |
| 117 | + six 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 | 462 |
| 118 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 | 574 |
| 119 | + eight 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 | 547 |
| 120 | + nine 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.01 0.00 | 596 |
| 121 | + #unk# 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.97 0.00 | 730 |
| 122 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 | 730 |
| 123 | + DCNN47 guessed wrong 88... |
| 124 | + |
| 125 | + DCNN13 confusion matrix... |
| 126 | + zero 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 | 603 |
| 127 | + one 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 | 575 |
| 128 | + two 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 564 |
| 129 | + three 0.00 0.00 0.00 0.98 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 | 548 |
| 130 | + four 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 605 |
| 131 | + five 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.01 0.00 | 607 |
| 132 | + six 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 | 462 |
| 133 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 | 574 |
| 134 | + eight 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.01 0.00 | 547 |
| 135 | + nine 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.01 0.00 | 596 |
| 136 | + #unk# 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.97 0.00 | 730 |
| 137 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 | 730 |
| 138 | + DCNN13 guessed wrong 82... |
| 139 | + |
| 140 | + EDCNN47 confusion matrix... |
| 141 | + zero 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 603 |
| 142 | + one 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.02 0.00 | 575 |
| 143 | + two 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 | 564 |
| 144 | + three 0.00 0.00 0.00 0.97 0.00 0.00 0.01 0.00 0.00 0.00 0.03 0.00 | 548 |
| 145 | + four 0.00 0.00 0.00 0.00 0.97 0.00 0.00 0.00 0.00 0.00 0.03 0.00 | 605 |
| 146 | + five 0.00 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.01 0.00 | 607 |
| 147 | + six 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.01 0.00 | 462 |
| 148 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.01 0.00 | 574 |
| 149 | + eight 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 | 547 |
| 150 | + nine 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.01 0.00 | 596 |
| 151 | + #unk# 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.00 | 730 |
| 152 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 | 730 |
| 153 | + EDCNN47 guessed wrong 116... |
| 154 | + |
| 155 | + ECNN47 confusion matrix... |
| 156 | + zero 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 | 603 |
| 157 | + one 0.00 0.98 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 | 575 |
| 158 | + two 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 564 |
| 159 | + three 0.00 0.00 0.00 0.98 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 | 548 |
| 160 | + four 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.01 0.00 | 605 |
| 161 | + five 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 | 607 |
| 162 | + six 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 | 462 |
| 163 | + seven 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 | 574 |
| 164 | + eight 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 | 547 |
| 165 | + nine 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 | 596 |
| 166 | + #unk# 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.98 0.00 | 730 |
| 167 | + #pub# 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 | 730 |
| 168 | + ECNN47 guessed wrong 63... |
| 169 | + |
| 170 | +Evaluate false positives: |
| 171 | + |
| 172 | + ~$ wget https://data.deepai.org/timit.zip -O /tmp/timit.zip |
| 173 | + ~$ unzip -q /tmp/timit.zip -d /tmp/timit # md5 5b736303c55cf4970926bb9978b655fe |
| 174 | + ~$ dataset/false.sh /tmp/timit 100 |
| 175 | + |
| 176 | +A false positive error, or false positive, is a result that indicates a given condition exists when it does not. |
| 177 | + |
| 178 | + EDCNN47 2042 | 11191 |
| 179 | + ECNN47 4494 | 11191 |
| 180 | + DCNN13 4787 | 11191 |
| 181 | + DCNN47 4517 | 11191 |
| 182 | + MLP 5091 | 10991 |
| 183 | + CNN 4958 | 10991 |
| 184 | + RNN 4527 | 10991 |
| 185 | + |
| 186 | +### Heap Memory Usage |
| 187 | +Some magic numbers to know before stepping into embedded world. |
| 188 | + |
| 189 | + ~$ valgrind dataset/dataset/google_speech_commands/bin/fe test.wav # 606,416 bytes allocated |
| 190 | + ~$ fe test.wav | valgrind bin/guess models/mlp.tflite # 347,138 bytes allocated |
| 191 | + ~$ fe test.wav | valgrind bin/guess models/cnn.tflite # 1,793,114 bytes allocated |
| 192 | + ~$ fe test.wav | valgrind bin/guess models/rnn.tflite # 2,442,810 bytes allocated |
| 193 | + ~$ seq 637 | valgrind bin/guess models/dcnn.tflite # 595,958 bytes allocated |
| 194 | + ~$ seq 611 | valgrind bin/guess models/dcnn47.tflite # 968,482 bytes allocated |
| 195 | + ~$ seq 13 | valgrind bin/guess models/dcnn13.tflite # 671,398 bytes allocated |
| 196 | + ~$ seq 611 | valgrind bin/guess models/edcnn47.tflite # 1,661,132 bytes allocated |
| 197 | + ~$ seq 611 | valgrind bin/guess models/ecnn47.tflite # 8,625,814 bytes allocated |
| 198 | + |
| 199 | +### Play |
| 200 | +Let's consider voice control for led bulb. |
| 201 | + |
| 202 | + ~$ bigram() { mawk -Winteractive '{if(s)print prev,$0; prev=$0; s=1}'; } |
| 203 | + ~$ intent() { mawk -Winteractive ' |
| 204 | + /0 6/{system("./on.sh")} |
| 205 | + /0 7/{system("./off.sh")} |
| 206 | + /0 8/{system("./yellow.sh")} |
| 207 | + /0 9/{system("./white.sh")} |
| 208 | + '; } |
| 209 | + |
| 210 | +There are 4 commands here - turn on, off, change color. When we speak words `zero six`, script `./on.sh` will be executed e.t.c. |
| 211 | + |
| 212 | + ~$ arecord -f S16_LE -c1 -r16000 -t raw | fe | \ |
| 213 | + bin/guess models/dcnn13.tflite | argmax | stable 10 | ignore 10 | bigram | intent |
0 commit comments