precision on imagenet experiment #21

Karami-m · 2024-01-22T15:49:22Z

Hi,

For imagenet, you mentioned in the paper the Hyena code is used for the experimentation by replacing MLP blocks in Hyena ViT-b with block-diagonal matrices, similarly to M2-BERT. Based on the config file: trainer: precision: 16 is used in Hyena, so I wonder if you use mixed precision bf16 here for imagenet (similar to M2-bert) to train it on A100 gpus or used simple 16-bit precision.

The text was updated successfully, but these errors were encountered:

Karami-m · 2024-01-22T15:55:08Z

Also, in the sequence mixer of M2-bert, you replaced attention with bidirectional gated convolutions with a residual long convolution (Figure3 left). So I wonder if did the same for imagenet and included residual long convolution in the model? I am asking as the monarch is part of a residual sequence mixing layer which has a residual connection (although it is not a residual long convolution).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

precision on imagenet experiment #21

precision on imagenet experiment #21

Karami-m commented Jan 22, 2024

Karami-m commented Jan 22, 2024

precision on imagenet experiment #21

precision on imagenet experiment #21

Comments

Karami-m commented Jan 22, 2024

Karami-m commented Jan 22, 2024