Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precision on imagenet experiment #21

Open
Karami-m opened this issue Jan 22, 2024 · 1 comment
Open

precision on imagenet experiment #21

Karami-m opened this issue Jan 22, 2024 · 1 comment

Comments

@Karami-m
Copy link

Hi,

For imagenet, you mentioned in the paper the Hyena code is used for the experimentation by replacing MLP blocks in Hyena ViT-b with block-diagonal matrices, similarly to M2-BERT. Based on the config file: trainer: precision: 16 is used in Hyena, so I wonder if you use mixed precision bf16 here for imagenet (similar to M2-bert) to train it on A100 gpus or used simple 16-bit precision.

@Karami-m
Copy link
Author

Also, in the sequence mixer of M2-bert, you replaced attention with bidirectional gated convolutions with a residual long convolution (Figure3 left). So I wonder if did the same for imagenet and included residual long convolution in the model? I am asking as the monarch is part of a residual sequence mixing layer which has a residual connection (although it is not a residual long convolution).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant