Anti-Pattern Matching support in Rule Base Matcher #7588
Replies: 2 comments 1 reply
-
I don't have the final say, but I think that the potential behavior of "anti-patterns" is going to be too variable for this to be a good feature to include in the core library. How is overlap defined, what does You can also do this with one matcher and use the match IDs to filter the types of matches. I suspect this would would be slightly faster since the matcher only has to run over the document once. |
Beta Was this translation helpful? Give feedback.
-
If anyone else see this discussion, here is an example of how to implement @delzac 's idea (removing overlaps with same rule id): https://gist.github.com/jordi-reinsma/2de3ad79ced025772bf9517f93614e93 |
Beta Was this translation helpful? Give feedback.
-
Some patterns are better defined by their exceptions (i.e. anti-patterns) which spaCy don't natively support. Native support for anti-pattern matching will promote more readable pattern construction.
Take this example, where we are looking for potential accidental misspelling of "not" as "nut". We might accept
nut job
,nut case
andfrench nut
, but notnut sure
,nut true
andtry nut to drop this
Without using anti-patterns, we might write the following pattern to look for the misspelling:
This above pattern is confusing to understanding.
It would be much more readable if we use anti-patterns like so.
Currently, users have to write a lot of boilerplate code to use anti-patterns. Typically like so,
I would like to propose the following API for spaCy to natively support anti-patterns:
Would be happy to raise a PR for this feature if the maintainers are agreeable to it! :)
Beta Was this translation helpful? Give feedback.
All reactions