Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix uniqid() performances #18232

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Fix uniqid() performances #18232

wants to merge 5 commits into from

Conversation

manu0401
Copy link
Contributor

@manu0401 manu0401 commented Apr 3, 2025

If available, use uuidgen() system call instead of a loop on gettimeofday(), that improves performances lot.

If available, use uuidgen() system call instead of a loop on
gettimeofday(), that improves performances lot.
@manu0401 manu0401 requested a review from bukka as a code owner April 3, 2025 00:43
@manu0401 manu0401 changed the title On some systems, uniqid() is critically slow. Fix uniqid() performances Apr 3, 2025
@@ -53,6 +87,24 @@ PHP_FUNCTION(uniqid)
Z_PARAM_BOOL(more_entropy)
ZEND_PARSE_PARAMETERS_END();

#ifdef __NetBSD__
struct uuid uuid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here. uuidgen is on FreeBSD/NetBSD (other oses have different apis) which you detect at configure time. Why the particular NetBSD code path is needed ? we should fall into the code above I think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I do not really get the code duplication. But not sure it s worth the efforts to be honest considering later comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be even executed - it's in elif section. Basically:

#ifdef HAVE_UUIDGEN
...
#elif HAVE_GETTIMEOFDAY
...
#ifdef HAVE_UUIDGEN

That doesn't make any sense to me. Am I missing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I committed it a bit too early, I should I removed that section that was used in earlier tests. It is fixed now.

@devnexen
Copy link
Member

devnexen commented Apr 3, 2025

Now that I remember, cpython used to use native calls for uuid generations, came back and forth to fix undesired behavior differences and ultimately decided to do one unique implementation in python.

before adding the uuidgen() test in configure
@TimWolla TimWolla self-requested a review April 3, 2025 06:53
@nielsdos
Copy link
Member

nielsdos commented Apr 3, 2025

Honestly, this function is so horrible that I wonder if we can't just replace the implementation with something entirely different but sane, given that deprecating it didn't pass.
I wonder if anyone relies on the documented behaviour that it's time based.

@TimWolla
Copy link
Member

TimWolla commented Apr 3, 2025

I wonder if anyone relies on the documented behaviour that it's time based.

As a weaker constraint that is also guaranteed by the time-based guarantee: The output is monotonically increasing, which is convenient for databases, because new values will not be distributed all over the place.


As I've also explained in my deprecation proposal (https://wiki.php.net/rfc/deprecations_php_8_4#deprecate_uniqid): Alternatives are already available that are better in every regard (except for “succinctness” perhaps) and changing the implementation will result in breaking changes for at least some of the users. Deprecating the function without intent of removal would have been the least impactful solution. Trying to make small incremental changes to uniqid() is just polishing a turd.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the general remarks regarding the state of uniqid(), there are also issues with the proposed implementation.

In any case, I'm requesting an RFC being written to change the uniqid() implementation, due to the breaking changes with regard to documented behavior.

Comment on lines 97 to 100
if (more_entropy) {
n[1] &= 0xffffff;
uniqid = strpprintf(0, "%s%08x%06x.%08x", prefix, n[0], n[1], n[2]);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output format is incompatible with the non-uuidgen output. It also leaks parts of the hosts MAC address and also doesn't actually provide “more entropy”, since the mac address is fixed on a single node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually compatible, despite the fact I do not understand how gettimeofday+more entropy produces 14+8 characters, where I would expect 13+8 reading the code. Check it on your own:
$ php -r 'printf("%s\n", uniqid("", 1));'
67ee7f958a5b34.94453710

For the MAC address, I understand it is not included in UUID anymore:
$ uuidgen
bf0ff2ae-cf7a-46b1-bd2e-c3577268e311
$ uuidgen
39638058-e887-41a5-97d1-6451b0622763

But I can understand it may be platform dependent, and we must avoid using it in case it could contains the MAC.

manu0401 added 3 commits April 3, 2025 14:24
on some platforms. Reuse the more_entropy from the gettimeofday()
version instead.

Refactor to avoid code duplication.
@manu0401
Copy link
Contributor Author

manu0401 commented Apr 3, 2025

Some numbers to show how badly slow the original implementation can be, here on a NetBSD 10.0 amd64 Xen domU. First the original version, second the uuidgen() flavor. Note that this is 100 iterations versus 1M.

$ time php -r 'for ($i = 0; $i < 100; $i++) uniqid();'
19.10s real 6.13s user 6.39s system

$ time php -r 'for ($i = 0; $i < 1000000; $i++) uniqid();'
3.38s real 1.73s user 1.65s system

@TimWolla
Copy link
Member

TimWolla commented Apr 4, 2025

My understanding is that Xen (historically?) had a particularly slow clock implementation. For Linux I believe this has been fixed since.

I'm sympathetic to uniqid() being slow in your case, but when a single call to uniqid() takes 200ms before seeing that the clock changed since the last call, then I'd say that PHP is not at fault here. In that time you could perform an entire HTTP request/response cycle across the big pond.

From your email address I'm seeing you are a NetBSD committer. Is NetBSD actually performing a syscall for each call to gettimeofday() or does it also have an implementation like VDSO on Linux? How many gettimeofday() calls are actually performed in your benchmark? Is it 100, 1000, 10000, more? Also, did the process spend 6 seconds sleeping? The user and system times do not add up to 19.

As I said before, changing uniqid() to no longer match the documented behavior of incorporating the time is a breaking change that would require an RFC.

The uuidgen() function appears to return UUIDv4 in your case, which are fully random. Since PHP 8.2, PHP provides a randomness API with pluggable engines, including fast userland engines such as xoshiro256**. The random_bytes() function using a CSPRNG is available since PHP 7.0. As a more practical solution, I recommend updating your code to explicitly make use of an RNG if you do not need the guarantees that uniqid() provides you.

@manu0401
Copy link
Contributor Author

manu0401 commented Apr 4, 2025

I'm sympathetic to uniqid() being slow in your case, but when a single call to uniqid() takes 200ms before seeing that the clock changed since the last call, then I'd say that PHP is not at fault here.

Indeed it is not, but it could help. There are many applications relying on uniqid(), and if you search the web, you find other people reporting performance problems with uniqid() on various setups.

For the NetBSD case, that was tracked down to Xen VM with multiple processor and using the clockinterrupt timecounter. In that setup, around 100k calls to gettimeofday() return the same time until it gets increased. This is certainly a bug that deserves to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants