Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of random.randbelow by providing a C implementation #126149

Closed
eendebakpt opened this issue Oct 29, 2024 · 6 comments
Closed
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage type-feature A feature request or enhancement

Comments

@eendebakpt
Copy link
Contributor

eendebakpt commented Oct 29, 2024

Feature or enhancement

Proposal:

We can improve performance of random.randbelow (and methods depending on that functionality such as random.shuffle) by creating a C implementation of _random._randbelow_with_getrandbits. A quick prototype with a straightforward port of the Python code shows we can gain a factor 2 or better in performance for many cases.

Benchmark:

rng.getrandbits(8) 59.75599982775748 [ns]

rng._randbelow_with_getrandbits(4) 230.81 [ns]
rng._randbelow_with_getrandbits_c(4) 83.28 [ns]

rng._randbelow_with_getrandbits(10) 208.89 [ns]
rng._randbelow_with_getrandbits_c(10) 83.92 [ns]

rng._randbelow_with_getrandbits(123) 179.30 [ns]
rng._randbelow_with_getrandbits_c(123) 60.23 [ns]

rng._randbelow_with_getrandbits(129) 225.87 [ns]
rng._randbelow_with_getrandbits_c(129) 79.82 [ns]

rng._randbelow_with_getrandbits(314) 225.01 [ns]
rng._randbelow_with_getrandbits_c(314) 112.94 [ns]

rng._randbelow_with_getrandbits(10**20) 324.53 [ns]
rng._randbelow_with_getrandbits_c(10**20) 135.60 [ns]

rng._randbelow_with_getrandbits(10**250) 1233.00 [ns]
rng._randbelow_with_getrandbits_c(10**250) 1048.73 [ns]

shuffle(x) 442.07 [us]
shuffle_c(x) 204.40 [us]

Prototype: main...eendebakpt:randbelow_c

Benchmark script

import random
import _random
import timeit

rng = random.Random()

number=100_000
dt=timeit.timeit('rng.getrandbits(8)', globals=globals(), number=number)
print(f'rng.getrandbits(8) {1e9*dt/number} [ns]')
print()


for a in [4, 10, 123, 129, 314, 10**20, 10**250]:
    if a > 1e8:
        number=100
    else:
        number=100_000
        
    astr = a
    if a==10**250:
        astr ='10**250'
    if a==10**20:
        astr ='10**20'
    dt=timeit.timeit(f'rng._randbelow_with_getrandbits({astr})', globals=globals(), number=number)
    print(f'rng._randbelow_with_getrandbits({astr}) {1e9*dt/number:.2f} [ns]')
    
    if hasattr(rng, '_randbelow_with_getrandbits_c'):
        dt=timeit.timeit(f'rng._randbelow_with_getrandbits_c({astr})', globals=globals(), number=number)
        print(f'rng._randbelow_with_getrandbits_c({astr}) {1e9*dt/number:.2f} [ns]')
    
    print()


def shuffle(x):
    """Shuffle list x in place, and return None."""

    randbelow = rng._randbelow_with_getrandbits
    for i in reversed(range(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = randbelow(i + 1)
        x[i], x[j] = x[j], x[i]

number=1000
x = list(range(1600))
dt=timeit.timeit('shuffle(x)', globals=globals(), number=number)
print(f'shuffle(x) {1e6*dt/number:.2f} [us]')

if hasattr(rng, '_randbelow_with_getrandbits_c'):

    def shuffle_c(x):
        """Shuffle list x in place, and return None."""
    
        randbelow = rng._randbelow_with_getrandbits_c
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = randbelow(i + 1)
            x[i], x[j] = x[j], x[i]
    
    number=1000
    x = list(range(1600))
    dt=timeit.timeit('shuffle_c(x)', globals=globals(), number=number)
    print(f'shuffle_c(x) {1e6*dt/number:.2f} [us]')

@rhettinger In your opinion: is the speed-up from the C implementation worthwhile when compared to the additional complexity of the C implementation? If so I will cleanup the code and make a PR

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

@eendebakpt eendebakpt added the type-feature A feature request or enhancement label Oct 29, 2024
@Eclips4 Eclips4 added the performance Performance or resource usage label Oct 29, 2024
@rhettinger
Copy link
Contributor

-1 We've long resisted recoding the random module in C.

@picnixz
Copy link
Contributor

picnixz commented Oct 30, 2024

-1 We've long resisted recoding the random module in C.

Considering this answer, I assume that this issue is likely to be closed. However, I'm wondering: what was the reason of not recoding it in C? is it to avoid additional work and / or hard-to-spot corner cases? (or maybe performances were not sufficiently important?)

@picnixz picnixz added the extension-modules C modules in the Modules dir label Oct 30, 2024
@rhettinger
Copy link
Contributor

We could code every Python module in C and make it faster. But we do like and prefer our own language which users can read and is easy for us to maintain.

The module does have a C foundation, Random.random and Random.getrandbits. Those are plenty fast. The rest of the onion is pure Python that users can read, understand, and easily construct their own variants.

@JelleZijlstra
Copy link
Member

Keeping the code in Python also makes it possible for future optimizations such as work on the JIT to make this code faster "for free".

@rhettinger
Copy link
Contributor

I was just about to add that note as well. Pure python is getting faster and faster.

@picnixz
Copy link
Contributor

picnixz commented Oct 30, 2024

Closing as not planned then!

@picnixz picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants