Skip to content

Problem detecting sentences #404

@aricohen93

Description

@aricohen93

Description

The sentences pipeline has different and unexpected behaviour. Depending on the structure of the date, it will make one or two sentences.

Example:

text1 = "10.10.2010 : RCP" ## >> 2 sentences: [10.10.2010 :, RCP]
text2 = "10/10/2010 : RCP" ## >> 1 sentences

How to reproduce the bug

import edsnlp.pipes as eds
import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.dates())


text1 = "10.10.2010 : RCP" ## >> 2 sentences: [10.10.2010 :, RCP]
text2 = "10/10/2010 : RCP" ## >> 1 sentences

doc1 = nlp(text1)
doc2 = nlp(text2)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions