MonarchMixerLayer #8

jeohalves · 2023-10-31T18:38:20Z

Hello,

I've come across an algorithm in the paper that appears to be designed for the M2 layer, with the intention of replacing both the Attention and MLP layers (specifically the nn.Linear part of the latter).

However, upon examining the monarch_mixer_sequence_mixer.py script, I noticed that it uses Hyena filters, and I couldn't find any implementation of this M2 layer algorithm in the code.

I might be missing something, but I wanted to clarify if it's necessary to substitute the Hyena filters with the M2 layer.

Thank you for your assistance with this project!

P.S.: I'm currently working with image data.

DanFu09 · 2023-10-31T18:44:08Z

Great question!

If you look at section 5.1, we use the Monarchs to implement long convolutions in conjunction with gating for a lot of the backbones. (also see this image from the blog).

I think with image data, you might actually be fine without the Hyena stuff - it's more important for language. In older experiments, we do see higher performance on ImageNet with the gating and the Hyena kernels. On CIFAR, the performance is about the same with/without the gating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MonarchMixerLayer #8

MonarchMixerLayer #8

jeohalves commented Oct 31, 2023

DanFu09 commented Oct 31, 2023

MonarchMixerLayer #8

MonarchMixerLayer #8

Comments

jeohalves commented Oct 31, 2023

DanFu09 commented Oct 31, 2023