预训练资料准备问题
#511
Replies: 1 comment
-
预训练语料不需要特殊处理。文档之间你可以用一个空行隔开。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
我想请问一下关于预训练资料集该如何准备,我现在想用wiki资料进行pre-train,在资料整备的方面,假设我今天有10篇文章,请问这10篇文章是直接写入txt就好,还是第一笔文章和第二笔文章之间需要用什么符号做区隔?
第二個問題是,同一個文章可能會因為段落換行,這個換行會影響模型的訓練嗎?
Beta Was this translation helpful? Give feedback.
All reactions