Tencent/TencentPretrain

SMP2020-EWECT的数据集开源使用问题

#133

· pxaklbe opened

on Oct 31, 2024

请问这里的中文模型支持的最大输入序列长度是512tokens吗？超过512tokens就会被截断嘛？可不可以在微调的时候扩大模型的位置编码数量？

#130

· chengzi-big opened

on Jul 21, 2024

单机2卡预训练LLAMA-7B报错TypeError: an integer is required (got type NoneType)

#112

· smallYellowCat opened

on Nov 29, 2023

【问题】deepspeed如何对不同显存大小分配数据，我有32G和16G两种大小的GPU

#98

· 18liumin opened

on Sep 26, 2023

LLaMA2-70B格式转换

#96

· Double-bear opened

on Sep 5, 2023

size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).

#93

· bbb4aaa opened

on Aug 19, 2023

KeyError: 'd'

#92

· bbb4aaa opened

on Aug 19, 2023

Resume from checkpoint

#81

· mohammadaminabbasi opened

on Aug 15, 2023

发现几个bug，dynamic_masking 意思是不是写反了，多了not

#80

· boluoyu opened

on Aug 10, 2023

lora推理

#71

· aSmallsheep opened

on Jun 6, 2023

DeepSpeedZeRoOffload initialization failed (can't allocate memory)

#70

· treya-lin opened

on May 25, 2023

关于扩充词表再增量预训练的疑问

#68

· qiancheng99 opened

on May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!