You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've come across an algorithm in the paper that appears to be designed for the M2 layer, with the intention of replacing both the Attention and MLP layers (specifically the nn.Linear part of the latter).
However, upon examining the monarch_mixer_sequence_mixer.py script, I noticed that it uses Hyena filters, and I couldn't find any implementation of this M2 layer algorithm in the code.
I might be missing something, but I wanted to clarify if it's necessary to substitute the Hyena filters with the M2 layer.
Thank you for your assistance with this project!
P.S.: I'm currently working with image data.
The text was updated successfully, but these errors were encountered:
If you look at section 5.1, we use the Monarchs to implement long convolutions in conjunction with gating for a lot of the backbones. (also see this image from the blog).
I think with image data, you might actually be fine without the Hyena stuff - it's more important for language. In older experiments, we do see higher performance on ImageNet with the gating and the Hyena kernels. On CIFAR, the performance is about the same with/without the gating.
Hello,
I've come across an algorithm in the paper that appears to be designed for the M2 layer, with the intention of replacing both the Attention and MLP layers (specifically the nn.Linear part of the latter).
However, upon examining the monarch_mixer_sequence_mixer.py script, I noticed that it uses Hyena filters, and I couldn't find any implementation of this M2 layer algorithm in the code.
I might be missing something, but I wanted to clarify if it's necessary to substitute the Hyena filters with the M2 layer.
Thank you for your assistance with this project!
P.S.: I'm currently working with image data.
The text was updated successfully, but these errors were encountered: