Advice on optimal node placement #1936
Replies: 1 comment
-
Hi @rupertoverall, Thanks for reaching out and apologies for the delay! These are great questions!
This is a great observation. We do like to use the "middle of the gray area" when we label videos from an overhead view, but for side views, the upper tip might be better defined. Either way, the most important thing is to pick a rule and be consistent in your annotations. In general, it's tough to annotate nodes that fall on featureless/low visual complexity regions, especially since it can throw off the neural networks. This is mitigated by affording the neural network the ability to reason about larger areas (i.e., the max receptive field size) which is achieved by increasing the max stride (downsampling blocks) of the model. A max stride of 32 or 64 will work well and ensures that the model can "see" the bigger context around the ear to reduce the odds that it confuses it with the background. Another approach that helps with this problem is to use a bottom-up model even when tracking single-animal data. This is because bottom-up forces the neural network to reason about the connectivity between body parts, which in turn requires that it integrate larger-scale spatial context which has the side effect of helping to disambiguate between locally similar patches in the image.
Yep, this is a toughie for us as well. The best thing to do is, again, to pick a consistent rule and stick with it. I find that it helps to visualize the spine of the animal and just label the last possible vertebra that's still visible. If what you have is working well, keep going with it, but one thing we've found that helps on the model side is to pick the "tail-tip interface" -- where fur turns into tail since that is a very clear visual feature. This fails when the animal is sitting on the tail, in which case we fall back to the last visible point, similar to what you're describing. In general, you can try to annotate ~30-50 frames using different strategies (on the same frames) and compare the results to see what seems to work best. Another is to ask a few different annotators to label the same frames to see which strategy might be more conducive to ensuring that human annotation is the most consistent. This is definitely an active area of research though which no super clear answers, so let us know what you discover! We'd be super curious to learn what worked best for you :) Cheers, Talmo |
Beta Was this translation helpful? Give feedback.
-
We are training a peculiar dataset where the subjects (mice) are backlit and the image resolution is limited (these parameters are fixed for other reasons). We have trained a model using a skeleton that does quite well, but I feel we can squeeze some better results out of the system. There are several aspects to consider - here I would like to specifically ask for advice about node selection and annotation tips.
The attached images show a typical image. I have marked two points:
I am really interested in any discussion that may help us to establish a training-optimised labelling workflow (we are new to this game but have a lot of SLEAP projects planned).
Beta Was this translation helpful? Give feedback.
All reactions