目前训练日志中,训练性能指标如何分析呢? #6174
Unanswered
tensorflowt
asked this question in
Q&A
Replies: 1 comment
-
|
这个速度表示每秒 40条样本。因为没有进行 packing 训练,单条样本包含多少个token 具体看该样本的长度。一般来说,每条样本有几十个到几百个token 左右,在常规文本(非长序列文本,例如:论文、RAG 等场景)的情形下,远远小于 2048 token。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Reminder
System Info
目前我这边在a800880GB机器上训练llama3-8b模型,我这边的训练日志如下:

这里面每秒训练样本数只有40个,这个样本数和tokens如何对应呢?如何是40个tokens那就很慢呀!如果是cutoff_len*10=409600又很大感觉也不对!这块帮忙分析一下?
Reproduction
我的配置文件如下:
model
model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct
cache_dir: /worker
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: /data/saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 8
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
Expected behavior
No response
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions