Greetings,
First of all, really appreciate the work you have done. This has been really useful for my experiments.
Now, to my question, I have seen that generally, for fine-tuning, the batch normalization layers are frozen.
Did the authors of this repository try something like that?
Just curious about it. I do plan to implement it myself (any tips on that would be appreciated btw), but I was wondering if the authors of this repo or the paper have any insights on it.