-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue] Calculate Tokens size? #15
Comments
Did you have an example that (still) does not work - the token count is identical for any text that I have checked. |
@rk-teche thank you for your feedback! There could be a discrepancy with the current OpenAI models, especially when compare with token counts from the API outputs. I am going to spend some time to try to move token calculation to use OpenAI's own |
I think what you will find is that the online tokenizer does not recognise |
Yep I'm aware of \ + n being counted as separated since it showed it clearly in the tokenizer screenshot above. In your last example then would the most appropriate way be to escape the string (or special chars) before passing it to the tokenizer? Alternatively what I've ended up doing is using the tokenizer as an estimation and not a fact (which also generally makes sense given the documentation and model differences long term) and following the Deep Dive Counting Tokens guide (for gpt3.5+) in the OpenAI docs. The combination of gpt3-tokenizer with the estimations they've provided in the doc is super helpful and brings the results a bit closer to accuracy. |
For passing to the tokeniser, you should escape in the regular javascript way, so Hello followed by two newlines is I've been commenting on these issues where folks are saying that "it's an estimate" or "it's not correct" because I switched to this library because it seems to be exactly correct. I felt of work had been done in this project to make it so, and I'd like everyone to benefit from that, knowing the results are accurate. |
Token size is not accurate if we compare it with GPT-3 Token.
Any help would be helpful.
Thanks
The text was updated successfully, but these errors were encountered: