Skip to content

Warper and HPE training hyperparams #9

@ifeherva

Description

@ifeherva

I noticed that the learning for both warper and hpe were fixed during training (100k/50k) iterations according to the paper. With these params I see the loss stagnating after a few thousand iterations.

Have you tried LR decay or different params? Why did you use such a high number of iterations, did you have steady decrease of loss over 50k iterations?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions