Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial Transformer example - locnet activation functions #46

Open
9thDimension opened this issue Oct 25, 2016 · 3 comments
Open

Spatial Transformer example - locnet activation functions #46

9thDimension opened this issue Oct 25, 2016 · 3 comments

Comments

@9thDimension
Copy link

See In [5] of https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb where the localisation network is defined.

Is there a reason why the Convolution2D layers have no activations? And the final layer (responsible for regressing the affine transformation parameters) does have a 'relu' activation. I may be wrong, but I thought that it's typical for the final layer of a neural net regression to have linear activation.

I asked some others about this, but nobody could explain why the activations are laid out this way, and they suggested I raise it here -- so hopefully the author can comment on these design choices.

@EderSantana
Copy link
Owner

STNs are really unstable. The gradients we get there are just not well behaved. I think that avoiding as much nonlinearities as possible was just the way the original authors of the paper found to get it to behave well.

So yeah, STNs are mostly not well understood yet. Maybe STNs are harder to train than GANs. The research community should be paying more attention to that soon. But now, when designing your STNs I'd recommend using the hyperparameters in the original paper and its follow ups.

@9thDimension
Copy link
Author

9thDimension commented Nov 2, 2016

Fair enough. I have found in other unrelated experiments that nets with linear activations can perform useful tasks and are fast to optimize.

So in that case, why do you have the 'relu' activation before the locnet's final regression layer?
I can't find where they suggest such an idea in the original paper.

P.S. Thanks and much respect for bringing these cutting-edge concepts to Keras in your Seya library.

@EderSantana
Copy link
Owner

I think the linear convs are just to quickly "find" the region of interests. But in some other experiments I'm doing, you can't go from the location in the image to parameters in the spatial transformer matrix with just one layer. You need at least one hidden layer even for simple experiments. That is what I found experimentally at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants