Question about the training strategy.

Thanks for your valuable work.

This paper provides details on model architecture, training strategy and hyperparameters. As a beginner, I'm not quite clear about the underlying reasons behind these settings. When I want to train a model, I often don't know how to design my training strategy. I would like to know how you determine this settings. Are there any methodologies or experiences to share in this regard? Specifically, my questions are:
1. How are the trainable parameters determined in each stage?
2. How are the hyperparameters determined in each stage, including batch size, learning rate, and epoch, etc.?
3. How are the different parts of the model architecture selected, and are there any underlying reasons?

Thank you again for your excellent work. Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the training strategy. #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the training strategy. #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions