-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Description
The sentences pipeline has different and unexpected behaviour. Depending on the structure of the date, it will make one or two sentences.
Example:
text1 = "10.10.2010 : RCP" ## >> 2 sentences: [10.10.2010 :, RCP]
text2 = "10/10/2010 : RCP" ## >> 1 sentences
How to reproduce the bug
import edsnlp.pipes as eds
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.dates())
text1 = "10.10.2010 : RCP" ## >> 2 sentences: [10.10.2010 :, RCP]
text2 = "10/10/2010 : RCP" ## >> 1 sentences
doc1 = nlp(text1)
doc2 = nlp(text2)
Metadata
Metadata
Assignees
Labels
No labels