-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-131269: Avoid binding functions in random.py #131270
Conversation
This speeds up calls like `random.randint()` by about 10%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also some other places where we did from os import urandom as _urandom
and used _urandom afterwards. Should this also be changed or is this is still a real optimisation? (namely, replace _urandom
by os.urandom
?)
The |
When I last looked at this a few months ago, the speed-up was questionable (it varied quite a bit across builds, compilers, and operating systems). The interpreter implementation is in rapid flux. For a long time, the pre-binding was faster. Then the LOAD_ATTR optimizations pulled ahead, but there is no reason think that advantage will persist. (We has a similar situation where LOAD_GLOBAL briefly became as fast as LOAD_LOCAL which seemed to invalidate previous efforts to use local variables). I would rather not "chase the interpreter" into a local minimum. and keep the current logic intact until the interpreter stablizes. Experience PyPy showing that prebinding could maintain its advantage even with JITted code. Ultimately, I expect that the current code would win out because it hoists some of the lookup logic outside of the loop. Right now, it adds a STORE_FAST opcode but the cost of that will drop to almost free as the impending JIT work moves forward. Can you elaborate more on the the "free-threading bottlenecks"? When I last looked at this while working on KDE for the statistics module, I found that prebinding wasn't the problem. Instead, what was needed was separate instances of Addenda: IIRC this was also discussed in the forums this year and Guido was against sweeping through and replacing bound methods in existing stable code. Maintainer's note: I personally find the calls to |
Hi @rhettinger - here's an example that demonstrates the free threading bottleneck, adapted from Paul Moore's program: montecarlo.py. It's nearly twice as fast with this change. It already uses separate instances of The underlying bottleneck is due to two issues:
I think it's possible to change the implementation to avoid both of these issues, but that will be complicated and require new techniques. The single threaded improvement is mostly related to avoiding the creation of the temporary method object and the bookkeeping that involves. I've seen an improvement for Python from to 3.9 to 3.14. I agree that with a sufficiently advanced JIT the existing code may perform the same as the existing code, but I don't think we're there yet. I also agree that we don't want to make sweeping changes across the code base. I think this change is pretty small and targeted and the new code is still idiomatic Python. |
How about making only the edit to randbelow methods (as proposed in the issue) and not sweeping through the rest of the module? I'm really uncomfortable with the latter. Unless there is compelling urgency for making all of these edits, we can discuss those at the sprints. Also ISTM the monte carlo benchmark is a toy example and we shouldn't tune to it. Benchmark chasing is rarely advisable and even more so in a time where the interpreter implementation undergoing so many changes. |
I've limited the changes to |
This speeds up calls like
random.randint()
by about 10% and also avoids some scaling bottlenecks in the free threading build.random.py
#131269