Advice on optimal node placement #1936

rupertoverall · 2024-09-04T07:34:19Z

rupertoverall
Sep 4, 2024

We are training a peculiar dataset where the subjects (mice) are backlit and the image resolution is limited (these parameters are fixed for other reasons). We have trained a model using a skeleton that does quite well, but I feel we can squeeze some better results out of the system. There are several aspects to consider - here I would like to specifically ask for advice about node selection and annotation tips.

The attached images show a typical image. I have marked two points:

Orange node. In the left image, this has been placed in the 'ear'. I want to detect a stable ear position. As the ear is seen from a range of different angles, the visible 'middle-of-the-ear' is not always the same anatomical point. In the right image, the 'tip-of-the-ear' has been labelled. Is that better? Or is the model going to confuse this with 'a-bit-of-the-background-near-a-light-coloured-bodypart'? The latter would be a BAD THING. What do experts do in cases where the bodyparts are not clearly-visible and well-defined points?
Blue node. Here I am trying to locate the tailbase. The actual base of the tail is almost never clearly visible, but it is extremely valuable to have (even a close) estimate of this node. We tend to label the lowest visible point on the back (right image). Is this going to lead to a consistent detection in the model (so far it does remarkably well) or should we be using a different strategy to make the model's life easier?

I am really interested in any discussion that may help us to establish a training-optimised labelling workflow (we are new to this game but have a lot of SLEAP projects planned).

talmo · 2024-09-12T02:45:20Z

talmo
Sep 12, 2024
Maintainer

Hi @rupertoverall,

Thanks for reaching out and apologies for the delay! These are great questions!

Orange node. In the left image, this has been placed in the 'ear'. I want to detect a stable ear position. As the ear is seen from a range of different angles, the visible 'middle-of-the-ear' is not always the same anatomical point. In the right image, the 'tip-of-the-ear' has been labelled. Is that better? Or is the model going to confuse this with 'a-bit-of-the-background-near-a-light-coloured-bodypart'? The latter would be a BAD THING. What do experts do in cases where the bodyparts are not clearly-visible and well-defined points?

This is a great observation. We do like to use the "middle of the gray area" when we label videos from an overhead view, but for side views, the upper tip might be better defined. Either way, the most important thing is to pick a rule and be consistent in your annotations.

In general, it's tough to annotate nodes that fall on featureless/low visual complexity regions, especially since it can throw off the neural networks. This is mitigated by affording the neural network the ability to reason about larger areas (i.e., the max receptive field size) which is achieved by increasing the max stride (downsampling blocks) of the model. A max stride of 32 or 64 will work well and ensures that the model can "see" the bigger context around the ear to reduce the odds that it confuses it with the background.

Another approach that helps with this problem is to use a bottom-up model even when tracking single-animal data. This is because bottom-up forces the neural network to reason about the connectivity between body parts, which in turn requires that it integrate larger-scale spatial context which has the side effect of helping to disambiguate between locally similar patches in the image.

Blue node. Here I am trying to locate the tailbase. The actual base of the tail is almost never clearly visible, but it is extremely valuable to have (even a close) estimate of this node. We tend to label the lowest visible point on the back (right image). Is this going to lead to a consistent detection in the model (so far it does remarkably well) or should we be using a different strategy to make the model's life easier?

Yep, this is a toughie for us as well. The best thing to do is, again, to pick a consistent rule and stick with it. I find that it helps to visualize the spine of the animal and just label the last possible vertebra that's still visible. If what you have is working well, keep going with it, but one thing we've found that helps on the model side is to pick the "tail-tip interface" -- where fur turns into tail since that is a very clear visual feature. This fails when the animal is sitting on the tail, in which case we fall back to the last visible point, similar to what you're describing.

In general, you can try to annotate ~30-50 frames using different strategies (on the same frames) and compare the results to see what seems to work best. Another is to ask a few different annotators to label the same frames to see which strategy might be more conducive to ensuring that human annotation is the most consistent.

This is definitely an active area of research though which no super clear answers, so let us know what you discover! We'd be super curious to learn what worked best for you :)

Cheers,

Talmo

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advice on optimal node placement #1936

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Advice on optimal node placement #1936

rupertoverall Sep 4, 2024

Replies: 1 comment

talmo Sep 12, 2024 Maintainer

rupertoverall
Sep 4, 2024

talmo
Sep 12, 2024
Maintainer