-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup topp by sorting less elements for each token. #270
Conversation
@rdentato what about adding top-k sampling and chain it before top-p ?
P.S: Idea is not mine, I have read chaining is possible. ....anyhow your patch is nice. |
I'm not sure how top-k and top-p would interact, I mean to get the top-k you should have the probabilities sorted right? But I understand that @karpathy wants to be sure before accepting PR, let's see what he thinks about it. |
Actually selection runs in O(n), you just need to iterate the array once. Intuitively, if K=3, you pick the first 3 elements, then iterate across the array from the fourth on and just replace the smallest k_min of the 3 if a new element > k_min is found.
If @karpathy does not want to add top-k, then your patch is definitely simpler, eventually some test would be needed to pick a sensible threshold. |
Another option is to parallelize the sorting, either with OpenMP or pthreads Here is an example: https://mcbeukman.medium.com/parallel-quicksort-using-openmp-9d18d7468cac The loop before qsort can also be parallelized with OpenMP |
I'll sleep on this but I think I'll merge it. Thank you. Does 1e-5 make it so with and without topp is more equal? |
I noticed a degradation of performance starting at 1e-8. I would stay around 1-e5 or 1e-6. |
Followint the indication of PR#276 and PR#274, I changed the limit from 1E-5 to (1-topp)/(n-1). |
Merged #276 ty |
I set up an arbitrary threshold to ignore probabilities lower than 0.1%.
I don't know if there is another, more specific, criteria for this, but reducing the elements to sort seems the way to go to get back to the former speed.