Optimal approach for training assuming limitless computing power and data #287

KristianBell · 2024-03-20T05:47:51Z

KristianBell
Mar 20, 2024

What would the ideal settings and training file setup look like to create the most accurate recogniser, assuming limitless computing power?

The default settings for training (Epochs 100, Batch size 32, Learning rate 0.01) I assume are set as a compromise between fast performance and accuracy, but what would lead to better (less false positives and/or higher recall) results? Presumably a higher number of epochs and a larger batch size? But a smaller learning rate?

Similarly, is there an 'ideal' number of training files to use? Is it simply a case of the more the better? And is this the same with negative training files? Would it be useful for every species, to include as negative training files the training data for all other species? I can see how this would get computationally expensive but would it actually improve model precision, or would it make the model too conservative and reduce recall to almost nothing? Is there a 'best' negative training file type to use? For example, a sound most similar to, but not, the target species? Or is any other noise, either similar or otherwise, also useful as a negative training file?

I expect the answer to some extent depends on how important precision vs recall is to your particular use case but it would still be nice to know if bumping up epochs for example would be a 'safe' thing to do if you can hack the extra time, but I could see adding a huge number of negative training files may at some point work against model performance.

Mattk70 · 2024-03-31T23:33:56Z

Mattk70
Mar 31, 2024

Hey @KristianBell , you know, this is a really difficult question to answer. Firstly, there is no one answer, or the answer is 'no, there is no ideal number of training files or set of settings'. There are however, some useful guidelines I can share...

The optimal training settings are highly context and content specific. The autotune option for the training.py script will help you hone in on good settings your training data. If you want to optimise further - inspect the output of the autotune runs. Eliminate the worst performing options, expand on those that seem to have a positive impact - so if the lowest / highest values perform best in the first round of tuning, re-run tuning with smaller/higher values.
The optimal learning rate is linked to batch size. As a rule of thumb, if you double the batch size, you should also double the learning rate for equivalent performance.
The ideal number of epochs. More is not necessarily better. Generally these models learn to a point and plateau. If anything, a model will perform less well if it is "overtrained", because it will start to overfit. This is when it learns that there are specific features in the training data that give it an advantage in predicting the right species during training. For example, if many of your -pick a species- files have crickets buzzing in them, it will associate crickets with that species, and tend to predict it whenever there are crickets in the audio.
What is really important is that your training data is representative of the range of calls for a species. Songbirds need more samples than others. Songbird learn their songs. Yes, these species' songs have an "essence", but there is considerable variety in their expression. To be representative, you need many more examples then you would for, let's say, a corvid. Whilst waders aren't songbirds, and their calls are relatively uniform, they can be encountereed in many environments. If you want the model to really learn the essence of the call, you need examples of that call both inland as well as on the coast. Which brings me to the final, perhaps most important point....
"Noise" is a catch-all class. It may help to think of it this way: A model 'believes' all sounds must be one of the classes it has been trained on. It does its best to determine which of these it is. If it wasn't trained on Martin Luther King's speech, and only knows about birds and noise like fireworks and lawnmowers, it's going to call it a bird, firework or lawnmower. If training a custom classifier: Make sure your noise class is representative of the noise environment for your recordings

I hope that helps. Good luck!

0 replies

KristianBell · 2024-04-10T04:58:42Z

KristianBell
Apr 10, 2024
Author

Many thanks for the info - much appreciated!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimal approach for training assuming limitless computing power and data #287

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Optimal approach for training assuming limitless computing power and data #287

Uh oh!

KristianBell Mar 20, 2024

Replies: 2 comments

Uh oh!

Mattk70 Mar 31, 2024

Uh oh!

KristianBell Apr 10, 2024 Author

KristianBell
Mar 20, 2024

Mattk70
Mar 31, 2024

KristianBell
Apr 10, 2024
Author