What is the suggested config for running LRA exps with Hyena?

I tried the hyena model on PathX exp but got bad results (val/loss=nan and the grad_norm of later layers near infinite).
My config:
`# @package _global_
defaults:
  - /pipeline: pathx
  - override /scheduler: cosine_warmup

scheduler:
  num_training_steps: 125000 # 50 epochs
  num_warmup_steps: 2500 # 1 epoch

model:
  _name_: model
  n_layers: 6
  d_model: 256
  norm: batch
  layer:
    _name_: hyena
    emb_dim: 3
    filter_order: 64 
    local_order: 3
    modulate: True
    l_max: 16384
    w: 1
    lr: ${optimizer.lr}
    lr_pos_emb: ${optimizer.lr}
    return_state: True

loader:
  batch_size: 25

optimizer:
  lr: 0.0005
  weight_decay: 0.05

trainer:
  max_epochs: 50

train:
  seed: 2222
  interval: step # For cosine scheduler
`

My command:
`python -m train trainer.devices=8 experiment=lra/hyena-lra-pathx +dataset.data_dir=./data/pathfinder128`

Is there something I've set up wrong, please?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the suggested config for running LRA exps with Hyena? #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the suggested config for running LRA exps with Hyena? #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions