You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For imagenet, you mentioned in the paper the Hyena code is used for the experimentation by replacing MLP blocks in Hyena ViT-b with block-diagonal matrices, similarly to M2-BERT. Based on the config file: trainer: precision: 16 is used in Hyena, so I wonder if you use mixed precision bf16 here for imagenet (similar to M2-bert) to train it on A100 gpus or used simple 16-bit precision.
The text was updated successfully, but these errors were encountered:
Also, in the sequence mixer of M2-bert, you replaced attention with bidirectional gated convolutions with a residual long convolution (Figure3 left). So I wonder if did the same for imagenet and included residual long convolution in the model? I am asking as the monarch is part of a residual sequence mixing layer which has a residual connection (although it is not a residual long convolution).
Hi,
For imagenet, you mentioned in the paper the Hyena code is used for the experimentation by replacing MLP blocks in Hyena ViT-b with block-diagonal matrices, similarly to M2-BERT. Based on the config file:
trainer: precision: 16
is used in Hyena, so I wonder if you use mixed precision bf16 here for imagenet (similar to M2-bert) to train it on A100 gpus or used simple 16-bit precision.The text was updated successfully, but these errors were encountered: