You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tokenizers respect padding: true with non-null max_length
This commit changes the behavior of tokenizers to match the
behavior described in the docs and the behavior of the Python
library.
Before this commit, passing
{
padding: true,
max_length: 512
}
or
{
padding: 'max_length',
max_length: 512
}
would both always pad all outputs to 512 tokens.
After this change,
{
padding: true,
max_length: 512
}
will now pad the outputs to match the longest encoding
or max_length, whichever is shorter.
This commit also adds a test to prevent regressions.
console.warn(`Truncation was not explicitly activated but \`max_length\` is provided a specific value, please use \`truncation=true\` to explicitly truncate examples to max length.`)
2803
-
}
2793
+
if(truncation&&max_length===null){
2794
+
max_length=this.model_max_length;
2795
+
}elseif(max_length&&truncation===null){
2796
+
console.warn(`Truncation was not explicitly activated but \`max_length\` is provided a specific value, please use \`truncation=true\` to explicitly truncate examples to max length.`)
2797
+
}
2798
+
2799
+
// padding: 'max_length' doesn't require any additional calculation
2800
+
// but padding: true has to calculate max_length from the sequences
0 commit comments