Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup topp by sorting less elements for each token. #270

Closed
wants to merge 5 commits into from

Conversation

rdentato
Copy link
Contributor

@rdentato rdentato commented Aug 10, 2023

I set up an arbitrary threshold to ignore probabilities lower than 0.1%.
I don't know if there is another, more specific, criteria for this, but reducing the elements to sort seems the way to go to get back to the former speed.

@xefoci7612
Copy link

xefoci7612 commented Aug 10, 2023

@rdentato what about adding top-k sampling and chain it before top-p ?

  1. We have also top-k sampling among the options

  2. If we chain before top-p speed is almost all recovered (I have tested in my repo with k = 64, that is quite a high number, you end up with a bigger set than top-p in >95% of cases even with top-p = 0.9, i.e. in > 95% of cases result is guaranteed to be the same.

  3. We keep top-p implementation to stick to correct definition of top-p sampling, user explicitly adds top-k if he wants.

P.S: Idea is not mine, I have read chaining is possible.

....anyhow your patch is nice.

@rdentato
Copy link
Contributor Author

I'm not sure how top-k and top-p would interact, I mean to get the top-k you should have the probabilities sorted right?
My patch tries to keep the set of probabilities to sort, smaller.
Looking at the logs of my tests (but I don't know how general they are) there are few tokens with high probability and many many more token with low probability.

But I understand that @karpathy wants to be sure before accepting PR, let's see what he thinks about it.

@xefoci7612
Copy link

xefoci7612 commented Aug 10, 2023

I'm not sure how top-k and top-p would interact, I mean to get the top-k you should have the probabilities sorted right?

Actually selection runs in O(n), you just need to iterate the array once. Intuitively, if K=3, you pick the first 3 elements, then iterate across the array from the fourth on and just replace the smallest k_min of the 3 if a new element > k_min is found.

My patch tries to keep the set of probabilities to sort, smaller. Looking at the logs of my tests (but I don't know how general they are) there are few tokens with high probability and many many more token with low probability.

But I understand that @karpathy wants to be sure before accepting PR, let's see what he thinks about it.

If @karpathy does not want to add top-k, then your patch is definitely simpler, eventually some test would be needed to pick a sensible threshold.

@kroggen
Copy link
Contributor

kroggen commented Aug 11, 2023

Another option is to parallelize the sorting, either with OpenMP or pthreads

Here is an example:

https://mcbeukman.medium.com/parallel-quicksort-using-openmp-9d18d7468cac

The loop before qsort can also be parallelized with OpenMP

@karpathy
Copy link
Owner

I'll sleep on this but I think I'll merge it. Thank you. Does 1e-5 make it so with and without topp is more equal?

@rdentato
Copy link
Contributor Author

I noticed a degradation of performance starting at 1e-8. I would stay around 1-e5 or 1e-6.
This could be made parametric with an option on the command line, say, -e 1e-4. At least to allow experimenting with different values.

@kroggen kroggen mentioned this pull request Aug 12, 2023
@rdentato
Copy link
Contributor Author

Followint the indication of PR#276 and PR#274, I changed the limit from 1E-5 to (1-topp)/(n-1).
For a vocab size of 32000, this is equal to 3.12E-6 and I can still see the benefits.

@karpathy
Copy link
Owner

Merged #276 ty

@karpathy karpathy closed this Aug 14, 2023
@rdentato rdentato deleted the patch-topp-optimization branch August 14, 2023 06:45
xefoci7612 pushed a commit to xefoci7612/baby-llama2.cpp that referenced this pull request Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants