Skip to content

Use win32 glob implementation on all platforms #18164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

NattyNarwhal
Copy link
Member

@NattyNarwhal NattyNarwhal commented Mar 27, 2025

Due to inconsistencies across platforms (see GH-15564), it's probably better to have a consistent glob implementation everywhere. We already have to have one for Win32 (based on the OpenBSD implementation), so we can also use it on Unix too.

Some concerns:

  • Using the system glob implementation isn't wired up yet; just a matter if binding ifdefs and gunk in the build system. Using the system glob implementation is covered by --enable-system-glob, which is non-default.
  • HAVE_GLOB is removed from code; should only be used for the previously mentioned system glob implementation if we decide to use that.
  • GLOB_ONLYDIR is a GNU extension, not implemented by the OpenBSD implementation (nor is it by macOS, other BSDs, musl, etc.). However, it's handled by callers due to the fact the GNU implementation is flawed, so we don't need to implement it in the glob implementation, especially if we want to be compatible with non-GNU. We just need to define the flag. This also removes an unused define as well.

This may need an RFC. It may also be prudent to merge GH-17433.

@NattyNarwhal
Copy link
Member Author

cc @Girgias

@Girgias
Copy link
Member

Girgias commented Mar 27, 2025

This is basically just moving stuff from Win32/ to main/ and defining the symbols as php right? This looks rather sensible just on its own, not sure if it needs an RFC, but maybe a quick discussion on internals is sufficient? I would imagine most people would agree that having the same behaviour regardless of platform is a win.

@NattyNarwhal
Copy link
Member Author

This is basically just moving stuff from Win32/ to main/ and defining the symbols as php right?

Correct, plus some cleanup of redundant definitions.

This looks rather sensible just on its own, not sure if it needs an RFC, but maybe a quick discussion on internals is sufficient? I would imagine most people would agree that having the same behaviour regardless of platform is a win.

This sounds good, I'll start there.

@NattyNarwhal NattyNarwhal marked this pull request as ready for review April 16, 2025 13:23
In preparation to make the Win32 reimplementation the standard
cross-platform one. Currently, it doesn't do that and just passes
through the original glob implementation. We could consider also having
an option to use the standard glob for systems that have a sufficient
one.
Kind of broken. We're namespacing the function and struct, but not yet
the GLOB_* defines. There are a lot of places callers check if i.e.
NOMATCH is defined that would likely become redundant.

Currently it also has php_glob and #defines glob php_glob (etc.) - I
suspect doing the opposite and changing the callers would make more
sense, just doing MVP to geet it to build (even if it fails tests).
Have not tested yet. the big things are:

- Should be invisible to userland PHP code.
- A lot of :%s/GLOB_/PHP_GLOB_/g; the diff can be noisy as a result,
  especially in comments.
- Prefixes everything with PHP_ to avoid conflicts with system glob in
  case it gets included transitively.
- A lot of weird shared definitions that were sprawled out to other
  headers are now included in php_glob.h.
- A lot of (but not yet all cases) of HAVE_GLOB are removed, since we
  can always fall back to php_glob.
- Using the system glob is not wired up yet; it'll need more shim
  ifdefs for each flag type than just glob_t/glob/globfree defs.
This is a GNU extension, but we don't need to implement it, as the GNU
implementation is flawed enough that callers have to manually filter it
anyways; just provide a stub definition for the constant.

We could consideer implementing this properly later. For now, fixes the
basic glob constant tests.
We now always have a glob implementation that works. HAVE_GLOB should
only be used to check if we have a system implementation, for if we
decide to wrap the system implementation instead.
Ideally temporary until phpGH-17433.
@NattyNarwhal
Copy link
Member Author

I haven't heard any feedback from internals@, so this seems uncontroversial so far. I've rebased this onto latest master now, so it should be ready to review.

@nielsdos nielsdos self-requested a review April 16, 2025 21:26
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, just a question and a nit

@@ -518,9 +518,7 @@ static const func_info_t func_infos[] = {
F1("getcwd", MAY_BE_STRING|MAY_BE_FALSE),
F1("readdir", MAY_BE_STRING|MAY_BE_FALSE),
F1("scandir", MAY_BE_ARRAY|MAY_BE_ARRAY_KEY_LONG|MAY_BE_ARRAY_OF_STRING|MAY_BE_FALSE),
#if defined(HAVE_GLOB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If system glob doesn't exist, and you disable using built-in glob, then what happens? I see you said something about this in the OP but I didn't quite follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent should be to fall back to the PHP implementation if it's not available. This means we don't have to have a bunch of #ifdef HAVE_GLOB anymore. I believe I got the ifdefs right for that, but if not, let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see, thanks for confirming.

@@ -721,13 +717,11 @@ static void spl_filesystem_object_construct(INTERNAL_FUNCTION_PARAMETERS, zend_l

/* spl_filesystem_dir_open() may emit an E_WARNING */
zend_replace_error_handling(EH_THROW, spl_ce_UnexpectedValueException, &error_handling);
#ifdef HAVE_GLOB
if (SPL_HAS_FLAG(ctor_flags, DIT_CTOR_GLOB) && !zend_string_starts_with_literal(path, "glob://")) {
path = zend_strpprintf(0, "glob://%s", ZSTR_VAL(path));
spl_filesystem_dir_open(intern, path);
zend_string_release(path);
} else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it, reformat this to be } else { ?

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit and question, but LGTM

Comment on lines +143 to +163
#define PHP_GLOB_APPEND 0x0001 /* Append to output from previous call. */
#define PHP_GLOB_DOOFFS 0x0002 /* Use gl_offs. */
#define PHP_GLOB_ERR 0x0004 /* Return on error. */
#define PHP_GLOB_MARK 0x0008 /* Append / to matching directories. */
#define PHP_GLOB_NOCHECK 0x0010 /* Return pattern itself if nothing matches. */
#define PHP_GLOB_NOSORT 0x0020 /* Don't sort. */
#define PHP_GLOB_NOESCAPE 0x1000 /* Disable backslash escaping. */

#define PHP_GLOB_NOSPACE (-1) /* Malloc call failed. */
#define PHP_GLOB_ABORTED (-2) /* Unignored error. */
#define PHP_GLOB_NOMATCH (-3) /* No match and PHP_GLOB_NOCHECK not set. */
#define PHP_GLOB_NOSYS (-4) /* Function not supported. */

#define PHP_GLOB_ALTDIRFUNC 0x0040 /* Use alternately specified directory funcs. */
#define PHP_GLOB_BRACE 0x0080 /* Expand braces ala csh. */
#define PHP_GLOB_MAGCHAR 0x0100 /* Pattern had globbing characters. */
#define PHP_GLOB_NOMAGIC 0x0200 /* PHP_GLOB_NOCHECK without magic chars (csh). */
#define PHP_GLOB_QUOTE 0x0400 /* Quote special chars with \. */
#define PHP_GLOB_TILDE 0x0800 /* Expand tilde names from the passwd file. */
#define PHP_GLOB_LIMIT 0x2000 /* Limit pattern match output to ARG_MAX */
#define PHP_GLOB_KEEPSTAT 0x4000 /* Retain stat data for paths in gl_statv. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we care that this is aligned or not? In any case it probably should use spaces instead of tabs (unless it is to reduce diff with upstream)

Comment on lines +60 to +61
#if defined(HAVE_GLOB) && defined(PHP_SYSTEM_GLOB)
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this if/else

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the already-said remarks, this LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants