Skip to content

Question about the training strategy. #20

Open
@MinghaoYe

Description

@MinghaoYe

Thanks for your valuable work.

This paper provides details on model architecture, training strategy and hyperparameters. As a beginner, I'm not quite clear about the underlying reasons behind these settings. When I want to train a model, I often don't know how to design my training strategy. I would like to know how you determine this settings. Are there any methodologies or experiences to share in this regard? Specifically, my questions are:

  1. How are the trainable parameters determined in each stage?
  2. How are the hyperparameters determined in each stage, including batch size, learning rate, and epoch, etc.?
  3. How are the different parts of the model architecture selected, and are there any underlying reasons?

Thank you again for your excellent work. Looking forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions