Open
Description
Thanks for your valuable work.
This paper provides details on model architecture, training strategy and hyperparameters. As a beginner, I'm not quite clear about the underlying reasons behind these settings. When I want to train a model, I often don't know how to design my training strategy. I would like to know how you determine this settings. Are there any methodologies or experiences to share in this regard? Specifically, my questions are:
- How are the trainable parameters determined in each stage?
- How are the hyperparameters determined in each stage, including batch size, learning rate, and epoch, etc.?
- How are the different parts of the model architecture selected, and are there any underlying reasons?
Thank you again for your excellent work. Looking forward to your reply.
Metadata
Metadata
Assignees
Labels
No labels