Speedup topp by sorting less elements for each token. #270

rdentato · 2023-08-10T15:58:37Z

I set up an arbitrary threshold to ignore probabilities lower than 0.1%.
I don't know if there is another, more specific, criteria for this, but reducing the elements to sort seems the way to go to get back to the former speed.

xefoci7612 · 2023-08-10T16:37:49Z

@rdentato what about adding top-k sampling and chain it before top-p ?

We have also top-k sampling among the options
If we chain before top-p speed is almost all recovered (I have tested in my repo with k = 64, that is quite a high number, you end up with a bigger set than top-p in >95% of cases even with top-p = 0.9, i.e. in > 95% of cases result is guaranteed to be the same.
We keep top-p implementation to stick to correct definition of top-p sampling, user explicitly adds top-k if he wants.

P.S: Idea is not mine, I have read chaining is possible.

....anyhow your patch is nice.

rdentato · 2023-08-10T18:16:25Z

I'm not sure how top-k and top-p would interact, I mean to get the top-k you should have the probabilities sorted right?
My patch tries to keep the set of probabilities to sort, smaller.
Looking at the logs of my tests (but I don't know how general they are) there are few tokens with high probability and many many more token with low probability.

But I understand that @karpathy wants to be sure before accepting PR, let's see what he thinks about it.

xefoci7612 · 2023-08-10T21:16:16Z

I'm not sure how top-k and top-p would interact, I mean to get the top-k you should have the probabilities sorted right?

Actually selection runs in O(n), you just need to iterate the array once. Intuitively, if K=3, you pick the first 3 elements, then iterate across the array from the fourth on and just replace the smallest k_min of the 3 if a new element > k_min is found.

My patch tries to keep the set of probabilities to sort, smaller. Looking at the logs of my tests (but I don't know how general they are) there are few tokens with high probability and many many more token with low probability.

But I understand that @karpathy wants to be sure before accepting PR, let's see what he thinks about it.

If @karpathy does not want to add top-k, then your patch is definitely simpler, eventually some test would be needed to pick a sensible threshold.

kroggen · 2023-08-11T01:42:47Z

Another option is to parallelize the sorting, either with OpenMP or pthreads

Here is an example:

https://mcbeukman.medium.com/parallel-quicksort-using-openmp-9d18d7468cac

The loop before qsort can also be parallelized with OpenMP

karpathy · 2023-08-11T02:55:39Z

I'll sleep on this but I think I'll merge it. Thank you. Does 1e-5 make it so with and without topp is more equal?

rdentato · 2023-08-11T06:28:32Z

I noticed a degradation of performance starting at 1e-8. I would stay around 1-e5 or 1e-6.
This could be made parametric with an option on the command line, say, -e 1e-4. At least to allow experimenting with different values.

…/llama2.c into patch-topp-optimization

rdentato · 2023-08-12T21:15:20Z

Followint the indication of PR#276 and PR#274, I changed the limit from 1E-5 to (1-topp)/(n-1).
For a vocab size of 32000, this is equal to 3.12E-6 and I can still see the benefits.

karpathy · 2023-08-14T00:13:33Z

Merged #276 ty

Imported from karpathy/llama2.c#270

Speedup topp by sorting less elements for each token.

757d0fd

if only one possibility has been selected, just return that one.

cfc459b

kroggen mentioned this pull request Aug 12, 2023

Speed up sample_topp #274

Closed

rdentato and others added 3 commits August 12, 2023 23:04

Merge branch 'karpathy:master' into patch-topp-optimization

3d829c8

Changed topp limit to (1-topp)/(n-1);

808f17c

Merge branch 'patch-topp-optimization' of https://github.com/rdentato…

aeeed32

…/llama2.c into patch-topp-optimization

jrudolph mentioned this pull request Aug 12, 2023

optimize sample_topp by filtering out small value elements up front #276

Merged

karpathy closed this Aug 14, 2023

rdentato deleted the patch-topp-optimization branch August 14, 2023 06:45

xefoci7612 pushed a commit to xefoci7612/baby-llama2.cpp that referenced this pull request Aug 26, 2023

Speedup topp by sorting less elements for each token

3dfddbc

Imported from karpathy/llama2.c#270

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup topp by sorting less elements for each token. #270

Speedup topp by sorting less elements for each token. #270

Uh oh!

rdentato commented Aug 10, 2023 •

edited

Loading

Uh oh!

xefoci7612 commented Aug 10, 2023 •

edited

Loading

Uh oh!

rdentato commented Aug 10, 2023

Uh oh!

xefoci7612 commented Aug 10, 2023 •

edited

Loading

Uh oh!

kroggen commented Aug 11, 2023

Uh oh!

karpathy commented Aug 11, 2023

Uh oh!

rdentato commented Aug 11, 2023

Uh oh!

rdentato commented Aug 12, 2023

Uh oh!

karpathy commented Aug 14, 2023

Uh oh!

Uh oh!

Speedup topp by sorting less elements for each token. #270

Speedup topp by sorting less elements for each token. #270

Uh oh!

Conversation

rdentato commented Aug 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xefoci7612 commented Aug 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdentato commented Aug 10, 2023

Uh oh!

xefoci7612 commented Aug 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kroggen commented Aug 11, 2023

Uh oh!

karpathy commented Aug 11, 2023

Uh oh!

rdentato commented Aug 11, 2023

Uh oh!

rdentato commented Aug 12, 2023

Uh oh!

karpathy commented Aug 14, 2023

Uh oh!

Uh oh!

rdentato commented Aug 10, 2023 •

edited

Loading

xefoci7612 commented Aug 10, 2023 •

edited

Loading

xefoci7612 commented Aug 10, 2023 •

edited

Loading