Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonarchMixerLayer #8

Open
jeohalves opened this issue Oct 31, 2023 · 1 comment
Open

MonarchMixerLayer #8

jeohalves opened this issue Oct 31, 2023 · 1 comment

Comments

@jeohalves
Copy link

Hello,

I've come across an algorithm in the paper that appears to be designed for the M2 layer, with the intention of replacing both the Attention and MLP layers (specifically the nn.Linear part of the latter).

However, upon examining the monarch_mixer_sequence_mixer.py script, I noticed that it uses Hyena filters, and I couldn't find any implementation of this M2 layer algorithm in the code.

I might be missing something, but I wanted to clarify if it's necessary to substitute the Hyena filters with the M2 layer.

Thank you for your assistance with this project!

P.S.: I'm currently working with image data.

@DanFu09
Copy link
Collaborator

DanFu09 commented Oct 31, 2023

Great question!

If you look at section 5.1, we use the Monarchs to implement long convolutions in conjunction with gating for a lot of the backbones. (also see this image from the blog).

I think with image data, you might actually be fine without the Hyena stuff - it's more important for language. In older experiments, we do see higher performance on ImageNet with the gating and the Hyena kernels. On CIFAR, the performance is about the same with/without the gating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants