-
Notifications
You must be signed in to change notification settings - Fork 70
Description
I tried the hyena model on PathX exp but got bad results (val/loss=nan and the grad_norm of later layers near infinite).
My config:
`# @Package global
defaults:
- /pipeline: pathx
- override /scheduler: cosine_warmup
scheduler:
num_training_steps: 125000 # 50 epochs
num_warmup_steps: 2500 # 1 epoch
model:
name: model
n_layers: 6
d_model: 256
norm: batch
layer:
name: hyena
emb_dim: 3
filter_order: 64
local_order: 3
modulate: True
l_max: 16384
w: 1
lr: ${optimizer.lr}
lr_pos_emb: ${optimizer.lr}
return_state: True
loader:
batch_size: 25
optimizer:
lr: 0.0005
weight_decay: 0.05
trainer:
max_epochs: 50
train:
seed: 2222
interval: step # For cosine scheduler
`
My command:
python -m train trainer.devices=8 experiment=lra/hyena-lra-pathx +dataset.data_dir=./data/pathfinder128
Is there something I've set up wrong, please?