-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial Transformer example - locnet activation functions #46
Comments
STNs are really unstable. The gradients we get there are just not well behaved. I think that avoiding as much nonlinearities as possible was just the way the original authors of the paper found to get it to behave well. So yeah, STNs are mostly not well understood yet. Maybe STNs are harder to train than GANs. The research community should be paying more attention to that soon. But now, when designing your STNs I'd recommend using the hyperparameters in the original paper and its follow ups. |
Fair enough. I have found in other unrelated experiments that nets with linear activations can perform useful tasks and are fast to optimize. So in that case, why do you have the P.S. Thanks and much respect for bringing these cutting-edge concepts to Keras in your Seya library. |
I think the linear convs are just to quickly "find" the region of interests. But in some other experiments I'm doing, you can't go from the location in the image to parameters in the spatial transformer matrix with just one layer. You need at least one hidden layer even for simple experiments. That is what I found experimentally at least |
See
In [5]
of https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb where the localisation network is defined.Is there a reason why the
Convolution2D
layers have no activations? And the final layer (responsible for regressing the affine transformation parameters) does have a'relu'
activation. I may be wrong, but I thought that it's typical for the final layer of a neural net regression to have linear activation.I asked some others about this, but nobody could explain why the activations are laid out this way, and they suggested I raise it here -- so hopefully the author can comment on these design choices.
The text was updated successfully, but these errors were encountered: