Skip to content

Report nested character classes as invalid #210

@apismensky

Description

@apismensky

Hyperscan should report the following cases as invalid (or unsupported) (from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html)

[a-d[m-p]]      a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]    d, e, or f (intersection)
[a-z&&[^bc]]    a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]   a through z, and not m through p: [a-lq-z](subtraction)

Right now validation succeeds but the scan returns false negatives for the following scenarios (PREFILTER flag is ON):

String regex="[a-d[m-p]]";
String input = "a";

String regex = "[a-z&&[def]]";
String input = "d";

String regex = "[a-z&&[^bc]]";
String input = "a";

String regex = "[a-z&&[^m-p]]";
String input = "a";

They all match in java8 and rust, according to https://regex101.com/

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions