-
Notifications
You must be signed in to change notification settings - Fork 488
Description
What version of regex are you using?
If it isn't the latest version, then please upgrade and check whether the bug
is still present.
I'm using the latest version, 0.8.8 on the Rust playground as of today.
Describe the bug at a high level.
The regex_escape() function does not escape whitespace characters, which leads to them being incorrectly ignored in verbose mode. Since the documentation states:
The string returned may be safely used as a literal in a regular expression.
I think the less surprising behavior would be to escape whitespace characters as well.
What are the steps to reproduce the behavior?
See the following code (Playground):
use regex_syntax; // 0.8.8
use regex::Regex;
fn main() {
let pattern = format!("^(?x:{lit})$", lit = regex_syntax::escape(" "));
let regex = Regex::new(&pattern).unwrap();
let m = regex.captures(" ").unwrap(); // one literal space
// this panics because the literal space in the pattern does not match the space in the haystack
dbg!(m.get_match().as_str(), m.get_match().range());
}What is the actual behavior?
As per the aforementioned comment, this panics, demonstrating that the literal, un-escaped space pattern does not match the space in the haystack if verbose mode is enabled, which is surprising:
Compiling playground v0.0.1 (/playground)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.82s
Running `target/debug/playground`
thread 'main' (13) panicked at src/main.rs:7:33:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
What is the expected behavior?
I would expect escaping to take the behavior of possible flags (e.g. ?x) into account, and escape whitespaces. For example, given the following function successfully accomplishes this task (Playground):
fn regex_escape(s: &str) -> String {
let mut result = String::with_capacity(s.len());
let mut last_idx = 0;
for (i, p) in s.match_indices(regex_syntax::is_escapeable_character) {
result.push_str(unsafe {
s.get_unchecked(last_idx..i)
});
last_idx = i + p.len();
write!(result, "{}", p.escape_unicode()).unwrap();
}
result.push_str(unsafe {
s.get_unchecked(last_idx..)
});
dbg!(result)
}