re is much slower than cpython #445

masklinn · 2024-10-25T05:23:10Z

I've been adding graal support to a classifier type project naively based on applying a bunch of regexes to an input, and while Graal works the regex application is quite slow: it's about 4x slower than cpython, while using 4 times the CPU.

Here's a repro script and attending data (basically a cut down version of the naive classifier implementation): script.zip

timings:

> python3.12 --version
Python 3.12.6
> time python3.12 run.py
75158 lines in 11.7s
	156.1 us/line
python3.12 run.py  11.77s user 0.05s system 97% cpu 12.172 total
> graalpy --version
GraalPy 3.11.7 (Oracle GraalVM Native 24.1.0)
> time graalpy run.py
75158 lines in 48.2s
	640.7 us/line
graalpy run.py  192.00s user 1.00s system 394% cpu 48.976 total

This is on a 10-core M1 Pro. Using cpusampler I confirmed that essentially all the "user" time is in _sre:

 _search   ||  43580ms  98.6% ||   43580ms  98.6% || <frozen graalpy._sre>~1:0

The text was updated successfully, but these errors were encountered:

msimacek · 2024-10-25T09:55:34Z

Regexes are their own compilation unit, their DFAs are JIT compiled and optimized separately just like python functions. You have 628 regexes, which takes a long time to compile. I got a better time (still worse than CPython) with the JIT compilation for regexes turned off (--experimental-options --engine.CompileOnly='~tregex re'), so it seems that the JIT compilation is too slow for regexes to pay off. I'll ask our regex experts if we can do something about it.

msimacek · 2024-10-25T12:43:22Z

So I talked to one of the devs of our regex engine. The main problem is that your regexes use a lot of x{n,m} patterns, with large m. Those currently don't allow generating a resonable DFA, so the regex engine has to use a slower fallback execution model. There's currently a person working on supporting these patterns in the faster execution model, so it should get better in the next release.

masklinn · 2024-10-25T14:28:31Z

Ah I see, I didn't know graal used a dfa internally, that explains things. I've had the same issue in rust as regex is also automata-based, and the way the regexes are written in the core data file also caused trouble (mostly memory, runtime did suffer a bit but not to the same extent).

I improved things by transforming the bounded repetitions back into unbounded before compilation, I'll see if I can do that for graal.

Although from my understanding the source dataset did that to limit risks of catastrophic backtracking in backtracking regex engines (like cpython's own), is there a flag exposed somewhere which indicates whether a regex uses backtracking or finite automata, to ensure I only perform rewriting when using a DFA?

msimacek · 2024-10-25T15:08:57Z

Although from my understanding the source dataset did that to limit risks of catastrophic backtracking in backtracking regex engines (like cpython's own), is there a flag exposed somewhere which indicates whether a regex uses backtracking or finite automata, to ensure I only perform rewriting when using a DFA?

Currently, no. Our regex engine seems to have a property for that, but we currently don't expose it on the Python Pattern object. (_sre.tregex_compile(pattern, _sre._METHOD_SEARCH, False).isBacktracking works, but it's a total hack that might break any time)

masklinn · 2024-10-25T15:24:21Z

Currently, no. Our regex engine seems to have a property for that, but we currently don't expose it on the Python Pattern object. (_sre.tregex_compile(pattern, _sre._METHOD_SEARCH, False).isBacktracking works, but it's a total hack that might break any time)

OK I'll go with an implementation check then at least for the time being (assuming the rewriting plan does good).

masklinn · 2024-10-28T17:04:39Z

@msimacek with a "simplifier" added to the script, the timings do improve by about 25% on this machine, but that's still quite far behind cpython.

> graalpy run.py -c
75158 lines in 35.8s
	476.8 us/line
graalpy run.py -c  142.91s user 0.93s system 391% cpu 36.729 total

The very minor improvement in runtime you noticed without regex compilation does seem to hold here (I get about 1 second less with the options you provideed), though the main benefit is likely CPU consumption dropping from 400% to 100%. Somewhat oddly it doesn't seem to change memory consumption much if at all, even though that was by far the biggest effect on both rust-regex and re2.

Here's the simplification routine I implemented:

REPETITION_PATTERN = re.compile(r"\{(0|1)\s*,\s*\d{3,}\}")
CLASS_PATTERN = re.compile(
    r"""
\[[^]]*\\(d|w)[^]]*\]
|
\\(d|w)
""",
    re.VERBOSE,
)

def class_replacer(m: re.Match[str]) -> str:
    d, w = ("0-9", "A-Za-z0-9_") if m[1] else ("[0-9]", "[A-Za-z0-9_]")
    return m[0].replace(r"\d", d).replace(r"\w", w)


def fa_simplifier(pattern: str) -> str:
    pattern = REPETITION_PATTERN.sub(lambda m: "*" if m[1] == "0" else "+", pattern)
    return CLASS_PATTERN.sub(class_replacer, pattern)

It basically converts the ranges with an upper bound of more than 3 digits to unbounded, and replaces the perl-style character classes by enumerations (that might be completely irrelevant for tregex, it's useful for rust-regex as its perl-style character classes are full unicode and thus a large amount of state).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re is much slower than cpython #445

re is much slower than cpython #445

masklinn commented Oct 25, 2024

msimacek commented Oct 25, 2024

msimacek commented Oct 25, 2024

masklinn commented Oct 25, 2024 •

edited

Loading

msimacek commented Oct 25, 2024

masklinn commented Oct 25, 2024

masklinn commented Oct 28, 2024 •

edited

Loading

re is much slower than cpython #445

re is much slower than cpython #445

Comments

masklinn commented Oct 25, 2024

msimacek commented Oct 25, 2024

msimacek commented Oct 25, 2024

masklinn commented Oct 25, 2024 • edited Loading

msimacek commented Oct 25, 2024

masklinn commented Oct 25, 2024

masklinn commented Oct 28, 2024 • edited Loading

masklinn commented Oct 25, 2024 •

edited

Loading

masklinn commented Oct 28, 2024 •

edited

Loading