Hi Tal,
I was noticed below part conflicting with the functions(2,3) provided, while reading 3.3 Semantic Softmax Training Scheme of ImageNet-21K Pretraining for the Masses:

WIth the functions(2,3) provided, the lower hierarchies will have a smaller Ok((O1 = N0)<(O2=N0+N1)), thus cause Wk((W1=1/O1)>(W2=1/O2)) to be larger, so that conflicted with the words I was labeled in the picture.
I was also read the source code, it seems have the same conflict if I didnt misunderstand the code or ignored sth.
Regards,
Zongyuan Sui