Replies: 1 comment
-
WG feedback is that this is okay. No code change is required, this is up to submitters. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
During the finetuning of the GPT-J model, an extra token (pad="[PAD]") was inadvertently introduced as part of the model, which increased the
lm_head
dimension by 1 (from 50400 to 50401).This extra token is now practically redundant (both in dataset preprocessing and post-processing if padding is used during batching).
Creating a thread to discuss making it optional to use 50401 vs 50400 in the
lm_head
.Beta Was this translation helpful? Give feedback.
All reactions