Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-116380: Speed up glob.glob() by removing some system calls #116392

Open
wants to merge 80 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
db3c620
GH-116380: Make `glob.glob()` twice as fast
barneygale Mar 5, 2024
9e1f059
Use `os.listdir()` if we don't need to check entry type.
barneygale Mar 5, 2024
10432df
A few small speedups.
barneygale Mar 6, 2024
7e389e2
Simplify prefix removal
barneygale Mar 6, 2024
8680a0a
Re-implement `glob0()`, `glob1()`, and `has_magic()`.
barneygale Mar 6, 2024
3bf3124
Fix errant `StopIteration`.
barneygale Mar 6, 2024
f8fb992
Skip compiling pattern for consecutive `**` segments.
barneygale Mar 6, 2024
50ef080
Clarify regex/path building in literal and recursive selectors.
barneygale Mar 6, 2024
ccefacd
Simplify code to ignore root_dir.
barneygale Mar 6, 2024
fa951f6
Fix possible Windows separator issue.
barneygale Mar 6, 2024
0aec12c
Address some review feedback.
barneygale Mar 6, 2024
72691ba
Use assignment expressions in a couple of places
barneygale Mar 6, 2024
c58dd21
Replace lambda with `operator.not_`.
barneygale Mar 6, 2024
c361ec9
Merge branch 'main' into gh-116380
barneygale Mar 6, 2024
22b30db
Speed up `_add_trailing_slash()`
barneygale Mar 6, 2024
83b70bd
Speed up `select_literal()`
barneygale Mar 7, 2024
1d32d14
Speed up `select_recursive()`
barneygale Mar 7, 2024
1e5aacc
Merge branch 'main' into gh-116380
barneygale Mar 17, 2024
a038bb8
Merge branch 'main' into gh-116380
barneygale Mar 18, 2024
f1440a9
Cache compiled patterns rather than selectors.
barneygale Mar 18, 2024
9c64643
Remove a bit of code duplication.
barneygale Mar 18, 2024
b0e8ba6
Fix stray newline
barneygale Mar 19, 2024
1b1233e
Merge branch 'main' into gh-116380
barneygale Mar 22, 2024
0e02ec5
Remove tests for glob0 and glob1
barneygale Mar 28, 2024
be4865e
Add a bunch of comments explaining the more subtle parts.
barneygale Mar 29, 2024
203e8ef
Merge branch 'main' into gh-116380
barneygale Apr 1, 2024
13355a0
Clarify variable naming in iglob()
barneygale Apr 3, 2024
2e5cebd
Use keyword arguments to pass True/False/None literals, for clarity.
barneygale Apr 4, 2024
5eba2eb
Speed up recursive globbing very slightly
barneygale Apr 4, 2024
b0a99b7
Merge branch 'main' into gh-116380
barneygale Apr 5, 2024
ad0ece8
Implement recursive wildcards with a stack
barneygale Apr 5, 2024
cafe9be
Add argument defaults, simplify code slightly.
barneygale Apr 5, 2024
301d922
Also make rel_path optional
barneygale Apr 5, 2024
beb2507
Optimise _add_trailing_slash
barneygale Apr 5, 2024
312c73a
Remove use of os.listdir() -- doesn't generalise
barneygale Apr 6, 2024
ae820e2
Add `_Globber` class; prepare for merger with pathlib globbing.
barneygale Apr 6, 2024
dcfe11d
Unify with pathlib implementation \o/
barneygale Apr 6, 2024
123a0f6
Use literal selector only if no case sensitivity preference is given.
barneygale Apr 6, 2024
0ed7b9c
Fix a few tests
barneygale Apr 6, 2024
aceb85f
Fix a few more tests.
barneygale Apr 6, 2024
b04de9d
Merge commit '689ada79150f28b0053fa6c1fb646b75ab2cc200' into gh-116380
barneygale Apr 10, 2024
3eb2d19
Merge branch 'main' into gh-116380
barneygale Apr 10, 2024
8a15db0
Fix select() argument order.
barneygale Apr 10, 2024
7eb3e61
Merge branch 'main' into gh-116380
barneygale Apr 12, 2024
316ea56
Merge branch 'main' into gh-116380
barneygale May 3, 2024
2018027
Support `include_hidden` and `dir_fd` in `pathlib._glob`.
barneygale May 3, 2024
2f21626
Fix stray newline
barneygale May 3, 2024
339df68
Update Lib/pathlib/_glob.py
barneygale May 4, 2024
28aa95f
Fix docs
barneygale May 4, 2024
abcb1f8
Test for unique results
barneygale May 4, 2024
71387a6
Spacing
barneygale May 4, 2024
de22de6
Merge branch 'main' into gh-116380
barneygale May 5, 2024
8b08374
Merge branch 'main' into gh-116380
barneygale May 7, 2024
54efa7c
Merge branch 'main' into gh-116380
barneygale May 8, 2024
cf11922
Update whatsnew
barneygale May 8, 2024
6710924
Merge branch 'main' into gh-116380
barneygale May 14, 2024
a547cd2
Merge branch 'main' into gh-116380
barneygale May 31, 2024
14ae438
Close file descriptors when `recursive_selector` is finalized.
barneygale May 31, 2024
69d7a86
Make `iglob()` a generator.
barneygale May 31, 2024
3b84a1d
Make `_iglob()` a generator.
barneygale May 31, 2024
f9f9a8d
Make `_relative_glob()` a generator.
barneygale May 31, 2024
24a9ee4
Simplify skipping empty string
barneygale May 31, 2024
d05d58d
Merge branch 'main' into gh-116380
barneygale Jun 4, 2024
27c463e
Merge branch 'main' into gh-116380
barneygale Jun 7, 2024
a94f2a7
Make `_GlobberBase` fully abstract.
barneygale Jun 7, 2024
d19bb89
Address review feedback
barneygale Jun 9, 2024
1677588
Typo fix
barneygale Jun 9, 2024
539f044
Speed up pattern parsing.
barneygale Jun 9, 2024
70a1b42
Add test for globbing above recursion limit.
barneygale Jun 12, 2024
1560712
Merge branch 'main' into gh-116380
barneygale Aug 26, 2024
099e86e
Apply suggestions from code review
barneygale Sep 1, 2024
ee76faf
Test that `iglob().close()` closes file descriptors.
barneygale Sep 1, 2024
4cf8a4d
Address some review feedback
barneygale Sep 1, 2024
8a118a7
Merge branch 'main' into gh-116380
barneygale Oct 27, 2024
3ad9367
Address more review comments
barneygale Oct 27, 2024
66af33d
Drop parse_entry
barneygale Oct 27, 2024
ce74ef1
Address review feedback
barneygale Oct 28, 2024
a69a060
Add comment.
barneygale Oct 28, 2024
a10a1e0
Merge branch 'main' into gh-116380
barneygale Nov 1, 2024
e2e0a2e
Merge branch 'main' into gh-116380
barneygale Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions Doc/library/glob.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,6 @@ The :mod:`glob` module defines the following functions:
Using the "``**``" pattern in large directory trees may consume
an inordinate amount of time.

.. note::
This function may return duplicate path names if *pathname*
contains multiple "``**``" patterns and *recursive* is true.

.. versionchanged:: 3.5
Support for recursive globs using "``**``".

Expand All @@ -88,6 +84,11 @@ The :mod:`glob` module defines the following functions:
.. versionchanged:: 3.11
Added the *include_hidden* parameter.

.. versionchanged:: 3.14
barneygale marked this conversation as resolved.
Show resolved Hide resolved
Matching path names are returned only once. In previous versions, this
barneygale marked this conversation as resolved.
Show resolved Hide resolved
function may return duplicate path names if *pathname* contains multiple
"``**``" patterns and *recursive* is true.


.. function:: iglob(pathname, *, root_dir=None, dir_fd=None, recursive=False, \
include_hidden=False)
Expand All @@ -98,10 +99,6 @@ The :mod:`glob` module defines the following functions:
.. audit-event:: glob.glob pathname,recursive glob.iglob
.. audit-event:: glob.glob/2 pathname,recursive,root_dir,dir_fd glob.iglob

.. note::
This function may return duplicate path names if *pathname*
contains multiple "``**``" patterns and *recursive* is true.

.. versionchanged:: 3.5
Support for recursive globs using "``**``".

Expand All @@ -111,6 +108,11 @@ The :mod:`glob` module defines the following functions:
.. versionchanged:: 3.11
Added the *include_hidden* parameter.

.. versionchanged:: 3.14
Matching path names are yielded only once. In previous versions, this
function may yield duplicate path names if *pathname* contains multiple
"``**``" patterns and *recursive* is true.


.. function:: escape(pathname)

Expand Down
9 changes: 9 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,15 @@ asyncio
reduces memory usage.
(Contributed by Kumar Aditya in :gh:`107803`.)


glob
----

* Reduce the number of system calls in :func:`glob.glob` and :func:`~glob.iglob`,
thereby improving the speed of globbing operations by 20-80%.
(Contributed by Barney Gale in :gh:`116380`.)


Deprecated
==========

Expand Down
Loading
Loading