Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick sort: Replace recursion with custom stack, small improvements #84

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

goldsteinn
Copy link

Instead of recursing just roll a custom stack with low/high bounds of the next region.

Also tune some of the logic a bit.
- Simpler (and faster) median + setup for partition
- Remove some unnecessary branches in hot control flow.

Results in roughly 10% perf improvement on the project benchmarks: (See PR for full run data)

4027959.8 / 4258144.7 -> 0.9459
Quick_sort 100000 x86_64 249 4027959.8 ns/op
Quick_sort 100000 x86_64 235 4258144.7 ns/op

Running tests with random numbers: 902582.0 / 940650.0 -> 0.9595
sort.h quick_sort - ok, 902582.0 usec
sort.h quick_sort - ok, 940650.0 usec

Running tests with same number: 8986.0 / 9059.0 -> 0.9919
sort.h quick_sort - ok, 8986.0 usec
sort.h quick_sort - ok, 9059.0 usec

Running tests with sorted numbers: 148790.0 / 160015.0 -> 0.9299
sort.h quick_sort - ok, 148790.0 usec
sort.h quick_sort - ok, 160015.0 usec

Running tests with sorted blocks of length 10: 872430.0 / 915431.0 -> 0.953
sort.h quick_sort - ok, 872430.0 usec
sort.h quick_sort - ok, 915431.0 usec

Running tests with sorted blocks of length 100: 751763.0 / 791987.0 -> 0.9492
sort.h quick_sort - ok, 751763.0 usec
sort.h quick_sort - ok, 791987.0 usec

Running tests with sorted blocks of length 10000: 461118.0 / 514853.0 -> 0.8956
sort.h quick_sort - ok, 461118.0 usec
sort.h quick_sort - ok, 514853.0 usec

Running tests with swapped size/2 pairs: 812161.0 / 854230.0 -> 0.9508
sort.h quick_sort - ok, 812161.0 usec
sort.h quick_sort - ok, 854230.0 usec

Running tests with swapped size/8 pairs: 522638.0 / 575848.0 -> 0.9076
sort.h quick_sort - ok, 522638.0 usec
sort.h quick_sort - ok, 575848.0 usec

Running tests with known evil data: 146601.0 / 196450.0 -> 0.7463
sort.h quick_sort - ok, 146601.0 usec
sort.h quick_sort - ok, 196450.0 usec

So roughly a 5-10% for most cases with the outliers being no-change for same-number and 25% improvement for "evil data".

Instead of recursing just roll a custom stack with low/high bounds
of the next region.

Also tune some of the logic a bit.
    - Simpler (and faster) median + setup for partition
    - Remove some unnecessary branches in hot control flow.

Results in roughly 10% perf improvement on the project benchmarks:
(See PR for full run data)

4027959.8 / 4258144.7 -> 0.9459
Quick_sort 100000 x86_64                  249        4027959.8 ns/op
Quick_sort 100000 x86_64                  235        4258144.7 ns/op

Running tests with random numbers: 902582.0 / 940650.0 -> 0.9595
sort.h quick_sort             - ok,   902582.0 usec
sort.h quick_sort             - ok,   940650.0 usec

Running tests with same number: 8986.0 / 9059.0 -> 0.9919
sort.h quick_sort             - ok,     8986.0 usec
sort.h quick_sort             - ok,     9059.0 usec

Running tests with sorted numbers: 148790.0 / 160015.0 -> 0.9299
sort.h quick_sort             - ok,   148790.0 usec
sort.h quick_sort             - ok,   160015.0 usec

Running tests with sorted blocks of length 10: 872430.0 / 915431.0 -> 0.953
sort.h quick_sort             - ok,   872430.0 usec
sort.h quick_sort             - ok,   915431.0 usec

Running tests with sorted blocks of length 100: 751763.0 / 791987.0 -> 0.9492
sort.h quick_sort             - ok,   751763.0 usec
sort.h quick_sort             - ok,   791987.0 usec

Running tests with sorted blocks of length 10000: 461118.0 / 514853.0 -> 0.8956
sort.h quick_sort             - ok,   461118.0 usec
sort.h quick_sort             - ok,   514853.0 usec

Running tests with swapped size/2 pairs: 812161.0 / 854230.0 -> 0.9508
sort.h quick_sort             - ok,   812161.0 usec
sort.h quick_sort             - ok,   854230.0 usec

Running tests with swapped size/8 pairs: 522638.0 / 575848.0 -> 0.9076
sort.h quick_sort             - ok,   522638.0 usec
sort.h quick_sort             - ok,   575848.0 usec

Running tests with known evil data: 146601.0 / 196450.0 -> 0.7463
sort.h quick_sort             - ok,   146601.0 usec
sort.h quick_sort             - ok,   196450.0 usec

So roughly a 5-10% for most cases with the outliers being no-change
for same-number and 25% improvement for "evil data".
@goldsteinn
Copy link
Author

goldsteinn commented Dec 12, 2022

Results from running make clean && make on this branch vs origin/master.

$> git checkout this-pr; make clean && taskset -c 0 make
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format demo.c -o demo
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format stresstest.c -o stresstest
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format multidemo.c -o multidemo
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format benchmark.c -o benchmark
./benchmark | tee benchmark.txt
Qsort 100000 x86_64                       140        7148857.1 ns/op
Binary_insertion_sort 100000 x86_64         2      645257500.0 ns/op
Bitonic_sort 100000 x86_64                  2      645854500.0 ns/op
Quick_sort 100000 x86_64                  249        4027959.8 ns/op
Merge_sort 100000 x86_64                  184        5446347.8 ns/op
Heap_sort 100000 x86_64                   155        6457664.5 ns/op
Shell_sort 100000 x86_64                  136        7369198.5 ns/op
Tim_sort 100000 x86_64                    165        6071715.2 ns/op
Merge_sort_in_place 100000 x86_64         197        5088426.4 ns/op
Grail_sort 100000 x86_64                  161        6218453.4 ns/op
Sqrt_sort 100000 x86_64                   186        5397403.2 ns/op
Rec_stable_sort 100000 x86_64              55       18193363.6 ns/op
Grail_sort_dyn_buffer 100000 x86_64       174        5762546.0 ns/op
./stresstest
       selection sort -- stable
binary insertion sort -- stable
          bubble sort -- stable
           quick sort -- UNSTABLE
           merge sort -- stable
            heap sort -- UNSTABLE
           shell sort -- UNSTABLE
             tim sort -- stable
merge (in-place) sort -- UNSTABLE
           grail sort -- stable
            sqrt sort -- stable
      rec stable sort -- stable
grail sort dyn byffer -- stable
-------
Running tests with random numbers:
-------
stdlib qsort                  - ok,  1393484.0 usec
sort.h quick_sort             - ok,   902582.0 usec
sort.h merge_sort             - ok,  1070078.0 usec
sort.h heap_sort              - ok,  1224040.0 usec
sort.h shell_sort             - ok,  1386712.0 usec
sort.h tim_sort               - ok,  1211521.0 usec
sort.h merge_sort_in_place    - ok,  1023660.0 usec
sort.h grail_sort             - ok,  1264447.0 usec
sort.h sqrt_sort              - ok,  1069604.0 usec
sort.h rec_stable_sort        - ok,  3460456.0 usec
sort.h grail_sort_dyn_buffer  - ok,  1164119.0 usec
-------
Running tests with same number:
-------
stdlib qsort                  - ok,   284886.0 usec
sort.h quick_sort             - ok,     8986.0 usec
sort.h merge_sort             - ok,   140154.0 usec
sort.h heap_sort              - ok,    32446.0 usec
sort.h shell_sort             - ok,   145532.0 usec
sort.h tim_sort               - ok,     4769.0 usec
sort.h merge_sort_in_place    - ok,    19246.0 usec
sort.h grail_sort             - ok,   148814.0 usec
sort.h sqrt_sort              - ok,   160428.0 usec
sort.h rec_stable_sort        - ok,   127074.0 usec
sort.h grail_sort_dyn_buffer  - ok,   148203.0 usec
-------
Running tests with sorted numbers:
-------
stdlib qsort                  - ok,   317956.0 usec
sort.h quick_sort             - ok,   148790.0 usec
sort.h merge_sort             - ok,   138919.0 usec
sort.h heap_sort              - ok,   705305.0 usec
sort.h shell_sort             - ok,   144461.0 usec
sort.h tim_sort               - ok,     4852.0 usec
sort.h merge_sort_in_place    - ok,    19673.0 usec
sort.h grail_sort             - ok,   226494.0 usec
sort.h sqrt_sort              - ok,   147886.0 usec
sort.h rec_stable_sort        - ok,   252852.0 usec
sort.h grail_sort_dyn_buffer  - ok,   155313.0 usec
-------
Running tests with sorted blocks of length 10:
-------
stdlib qsort                  - ok,  1227247.0 usec
sort.h quick_sort             - ok,   872430.0 usec
sort.h merge_sort             - ok,   947458.0 usec
sort.h heap_sort              - ok,  1191566.0 usec
sort.h shell_sort             - ok,  1298468.0 usec
sort.h tim_sort               - ok,  1082021.0 usec
sort.h merge_sort_in_place    - ok,   928320.0 usec
sort.h grail_sort             - ok,  1174457.0 usec
sort.h sqrt_sort              - ok,   977433.0 usec
sort.h rec_stable_sort        - ok,  3343152.0 usec
sort.h grail_sort_dyn_buffer  - ok,  1076379.0 usec
-------
Running tests with sorted blocks of length 100:
-------
stdlib qsort                  - ok,   948427.0 usec
sort.h quick_sort             - ok,   751763.0 usec
sort.h merge_sort             - ok,   695269.0 usec
sort.h heap_sort              - ok,  1156169.0 usec
sort.h shell_sort             - ok,  1079533.0 usec
sort.h tim_sort               - ok,   604579.0 usec
sort.h merge_sort_in_place    - ok,   737317.0 usec
sort.h grail_sort             - ok,   931553.0 usec
sort.h sqrt_sort              - ok,   750577.0 usec
sort.h rec_stable_sort        - ok,  2810819.0 usec
sort.h grail_sort_dyn_buffer  - ok,   842407.0 usec
-------
Running tests with sorted blocks of length 10000:
-------
stdlib qsort                  - ok,   433842.0 usec
sort.h quick_sort             - ok,   461118.0 usec
sort.h merge_sort             - ok,   243061.0 usec
sort.h heap_sort              - ok,   885431.0 usec
sort.h shell_sort             - ok,   398555.0 usec
sort.h tim_sort               - ok,   113289.0 usec
sort.h merge_sort_in_place    - ok,   258593.0 usec
sort.h grail_sort             - ok,   400430.0 usec
sort.h sqrt_sort              - ok,   279578.0 usec
sort.h rec_stable_sort        - ok,   817718.0 usec
sort.h grail_sort_dyn_buffer  - ok,   324363.0 usec
-------
Running tests with swapped size/2 pairs:
-------
stdlib qsort                  - ok,  1185254.0 usec
sort.h quick_sort             - ok,   812161.0 usec
sort.h merge_sort             - ok,   878181.0 usec
sort.h heap_sort              - ok,  1152633.0 usec
sort.h shell_sort             - ok,  1363910.0 usec
sort.h tim_sort               - ok,   979855.0 usec
sort.h merge_sort_in_place    - ok,   858841.0 usec
sort.h grail_sort             - ok,  1047927.0 usec
sort.h sqrt_sort              - ok,   870839.0 usec
sort.h rec_stable_sort        - ok,  3049760.0 usec
sort.h grail_sort_dyn_buffer  - ok,   962422.0 usec
-------
Running tests with swapped size/8 pairs:
-------
stdlib qsort                  - ok,   730191.0 usec
sort.h quick_sort             - ok,   522638.0 usec
sort.h merge_sort             - ok,   461843.0 usec
sort.h heap_sort              - ok,   886073.0 usec
sort.h shell_sort             - ok,  1235583.0 usec
sort.h tim_sort               - ok,   492259.0 usec
sort.h merge_sort_in_place    - ok,   505915.0 usec
sort.h grail_sort             - ok,   633954.0 usec
sort.h sqrt_sort              - ok,   505792.0 usec
sort.h rec_stable_sort        - ok,  1633678.0 usec
sort.h grail_sort_dyn_buffer  - ok,   558622.0 usec
-------
Running tests with known evil data:
-------
stdlib qsort                  - ok,   338826.0 usec
sort.h quick_sort             - ok,   146601.0 usec
sort.h merge_sort             - ok,   161545.0 usec
sort.h heap_sort              - ok,   693945.0 usec
sort.h shell_sort             - ok,   150226.0 usec
sort.h tim_sort               - ok,   131000.0 usec
sort.h merge_sort_in_place    - ok,    16270.0 usec
sort.h grail_sort             - ok,   225898.0 usec
sort.h sqrt_sort              - ok,   147242.0 usec
sort.h rec_stable_sort        - ok,   249025.0 usec
sort.h grail_sort_dyn_buffer  - ok,   154344.0 usec

$> git checkout master; make clean && taskset -c 0 make
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format demo.c -o demo
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format stresstest.c -o stresstest
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format multidemo.c -o multidemo
cc -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format benchmark.c -o benchmark
./benchmark | tee benchmark.txt
Qsort 100000 x86_64                       145        6931013.8 ns/op
Binary_insertion_sort 100000 x86_64         2      642513000.0 ns/op
Bitonic_sort 100000 x86_64                  2      642168500.0 ns/op
Quick_sort 100000 x86_64                  235        4258144.7 ns/op
Merge_sort 100000 x86_64                  178        5626556.2 ns/op
Heap_sort 100000 x86_64                   159        6321735.8 ns/op
Shell_sort 100000 x86_64                  130        7730269.2 ns/op
Tim_sort 100000 x86_64                    164        6115536.6 ns/op
Merge_sort_in_place 100000 x86_64         196        5111959.2 ns/op
Grail_sort 100000 x86_64                  162        6187240.7 ns/op
Sqrt_sort 100000 x86_64                   186        5381634.4 ns/op
Rec_stable_sort 100000 x86_64              57       17745824.6 ns/op
Grail_sort_dyn_buffer 100000 x86_64       175        5729914.3 ns/op
./stresstest
       selection sort -- stable
binary insertion sort -- stable
          bubble sort -- stable
           quick sort -- UNSTABLE
           merge sort -- stable
            heap sort -- UNSTABLE
           shell sort -- UNSTABLE
             tim sort -- stable
merge (in-place) sort -- UNSTABLE
           grail sort -- stable
            sqrt sort -- stable
      rec stable sort -- stable
grail sort dyn byffer -- stable
-------
Running tests with random numbers:
-------
stdlib qsort                  - ok,  1379273.0 usec
sort.h quick_sort             - ok,   940650.0 usec
sort.h merge_sort             - ok,  1072472.0 usec
sort.h heap_sort              - ok,  1233216.0 usec
sort.h shell_sort             - ok,  1379869.0 usec
sort.h tim_sort               - ok,  1198070.0 usec
sort.h merge_sort_in_place    - ok,  1004441.0 usec
sort.h grail_sort             - ok,  1248589.0 usec
sort.h sqrt_sort              - ok,  1069499.0 usec
sort.h rec_stable_sort        - ok,  3498502.0 usec
sort.h grail_sort_dyn_buffer  - ok,  1161418.0 usec
-------
Running tests with same number:
-------
stdlib qsort                  - ok,   284395.0 usec
sort.h quick_sort             - ok,     9059.0 usec
sort.h merge_sort             - ok,   137849.0 usec
sort.h heap_sort              - ok,    32171.0 usec
sort.h shell_sort             - ok,   144430.0 usec
sort.h tim_sort               - ok,     6943.0 usec
sort.h merge_sort_in_place    - ok,    20060.0 usec
sort.h grail_sort             - ok,   146077.0 usec
sort.h sqrt_sort              - ok,   161937.0 usec
sort.h rec_stable_sort        - ok,   141100.0 usec
sort.h grail_sort_dyn_buffer  - ok,   146543.0 usec
-------
Running tests with sorted numbers:
-------
stdlib qsort                  - ok,   315113.0 usec
sort.h quick_sort             - ok,   160015.0 usec
sort.h merge_sort             - ok,   138050.0 usec
sort.h heap_sort              - ok,   744952.0 usec
sort.h shell_sort             - ok,   143388.0 usec
sort.h tim_sort               - ok,     6964.0 usec
sort.h merge_sort_in_place    - ok,    20219.0 usec
sort.h grail_sort             - ok,   229629.0 usec
sort.h sqrt_sort              - ok,   150886.0 usec
sort.h rec_stable_sort        - ok,   266165.0 usec
sort.h grail_sort_dyn_buffer  - ok,   157536.0 usec
-------
Running tests with sorted blocks of length 10:
-------
stdlib qsort                  - ok,  1222918.0 usec
sort.h quick_sort             - ok,   915431.0 usec
sort.h merge_sort             - ok,   941694.0 usec
sort.h heap_sort              - ok,  1202870.0 usec
sort.h shell_sort             - ok,  1287957.0 usec
sort.h tim_sort               - ok,  1085309.0 usec
sort.h merge_sort_in_place    - ok,   918509.0 usec
sort.h grail_sort             - ok,  1168306.0 usec
sort.h sqrt_sort              - ok,   986384.0 usec
sort.h rec_stable_sort        - ok,  3368499.0 usec
sort.h grail_sort_dyn_buffer  - ok,  1081310.0 usec
-------
Running tests with sorted blocks of length 100:
-------
stdlib qsort                  - ok,   947558.0 usec
sort.h quick_sort             - ok,   791987.0 usec
sort.h merge_sort             - ok,   698161.0 usec
sort.h heap_sort              - ok,  1160865.0 usec
sort.h shell_sort             - ok,  1074345.0 usec
sort.h tim_sort               - ok,   607297.0 usec
sort.h merge_sort_in_place    - ok,   729034.0 usec
sort.h grail_sort             - ok,   940122.0 usec
sort.h sqrt_sort              - ok,   768022.0 usec
sort.h rec_stable_sort        - ok,  2841462.0 usec
sort.h grail_sort_dyn_buffer  - ok,   859127.0 usec
-------
Running tests with sorted blocks of length 10000:
-------
stdlib qsort                  - ok,   432742.0 usec
sort.h quick_sort             - ok,   514853.0 usec
sort.h merge_sort             - ok,   242865.0 usec
sort.h heap_sort              - ok,   966050.0 usec
sort.h shell_sort             - ok,   393624.0 usec
sort.h tim_sort               - ok,   115919.0 usec
sort.h merge_sort_in_place    - ok,   259527.0 usec
sort.h grail_sort             - ok,   403945.0 usec
sort.h sqrt_sort              - ok,   287455.0 usec
sort.h rec_stable_sort        - ok,   838015.0 usec
sort.h grail_sort_dyn_buffer  - ok,   332536.0 usec
-------
Running tests with swapped size/2 pairs:
-------
stdlib qsort                  - ok,  1182378.0 usec
sort.h quick_sort             - ok,   854230.0 usec
sort.h merge_sort             - ok,   886641.0 usec
sort.h heap_sort              - ok,  1165720.0 usec
sort.h shell_sort             - ok,  1359984.0 usec
sort.h tim_sort               - ok,   974459.0 usec
sort.h merge_sort_in_place    - ok,   851682.0 usec
sort.h grail_sort             - ok,  1040230.0 usec
sort.h sqrt_sort              - ok,   877039.0 usec
sort.h rec_stable_sort        - ok,  3084658.0 usec
sort.h grail_sort_dyn_buffer  - ok,   968540.0 usec
-------
Running tests with swapped size/8 pairs:
-------
stdlib qsort                  - ok,   727391.0 usec
sort.h quick_sort             - ok,   575848.0 usec
sort.h merge_sort             - ok,   462145.0 usec
sort.h heap_sort              - ok,   952564.0 usec
sort.h shell_sort             - ok,  1230290.0 usec
sort.h tim_sort               - ok,   486656.0 usec
sort.h merge_sort_in_place    - ok,   501671.0 usec
sort.h grail_sort             - ok,   628346.0 usec
sort.h sqrt_sort              - ok,   513851.0 usec
sort.h rec_stable_sort        - ok,  1657128.0 usec
sort.h grail_sort_dyn_buffer  - ok,   564601.0 usec
-------
Running tests with known evil data:
-------
stdlib qsort                  - ok,   337554.0 usec
sort.h quick_sort             - ok,   196450.0 usec
sort.h merge_sort             - ok,   160073.0 usec
sort.h heap_sort              - ok,   746918.0 usec
sort.h shell_sort             - ok,   147422.0 usec
sort.h tim_sort               - ok,   128985.0 usec
sort.h merge_sort_in_place    - ok,    19598.0 usec
sort.h grail_sort             - ok,   229641.0 usec
sort.h sqrt_sort              - ok,   149852.0 usec
sort.h rec_stable_sort        - ok,   267598.0 usec
sort.h grail_sort_dyn_buffer  - ok,   158364.0 usec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant