Skip to content

Conversation

rdentato
Copy link
Contributor

Currently, the vocabulary is sorted every time the encode() function is called.
For the current version, it's not an issue since the program exits every time but if the generation code would be called within a loop to have a "chat" version, this qsort will be useless (and detrimental).

I've moved this to the build_tokenizer() function as it seems to me that this is an operation that should logically be performed at the time when we set up the tokenizer.

@karpathy
Copy link
Owner

Slight downside here is that sorting the vocab probably creates latency, and if the user isn't going to prompt the model we are paying the price needlessly. Maybe we can allocate it lazily?

@rdentato
Copy link
Contributor Author

Yes, it can be done in "encode()" to ensure it is done only when needed. I'll change it.

@rdentato
Copy link
Contributor Author

Done. Frankly, I don't think there will be many using llama2.c with no prompt, but I may be wrong.

@karpathy
Copy link
Owner

Just pushed a very similar commit ty

@karpathy karpathy closed this Aug 22, 2023
@rdentato rdentato deleted the patch-less-qsort-in-encode branch August 22, 2023 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants