Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: nlp-sentencize wrongly breaks sentences in quotation marks #3017

Open
2 tasks done
Pupix opened this issue Oct 18, 2024 · 1 comment · May be fixed by #3089
Open
2 tasks done

[BUG]: nlp-sentencize wrongly breaks sentences in quotation marks #3017

Pupix opened this issue Oct 18, 2024 · 1 comment · May be fixed by #3089
Labels
Bug Something isn't working.

Comments

@Pupix
Copy link

Pupix commented Oct 18, 2024

Description

As the title says.

Here are some quick examples

console.log(sentencize('I said "Look out" right before he banged his head'));
> [ 'I said "Look out" right before he banged his head' ] // This is correct

console.log(sentencize('I said "Look out!" right before he banged his head'));
> ['I said "Look out!"', 'right before he banged his head'] // This should be one sentence

From looking at the code it seems to be doing exactly as it's told, but doesn't seem quite right.
Image
If it's a suffix aka " and previous token is a punctuation mark .!?, then split.

Related Issues

#3013

Questions

No.

Demo

No response

Reproduction

console.log(sentencize('I said "Look out!" right before he banged his head'));
> ['I said "Look out!"', 'right before he banged his head']

Expected Results

['I said "Look out!" right before he banged his head']

Actual Results

['I said "Look out!"', 'right before he banged his head']

Version

0.2.2

Environments

Node.js

Browser Version

No response

Node.js / npm Version

v22.9.0

Platform

Windows 11

Checklist

  • Read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.
@kgryte kgryte added the Bug Something isn't working. label Oct 18, 2024
@kgryte kgryte changed the title nlp-sentencize wrongly breaks sentences in quotation marks [Bug]: nlp-sentencize wrongly breaks sentences in quotation marks Oct 18, 2024
@kgryte kgryte changed the title [Bug]: nlp-sentencize wrongly breaks sentences in quotation marks [BUG]: nlp-sentencize wrongly breaks sentences in quotation marks Oct 18, 2024
@Srayash
Copy link

Srayash commented Oct 18, 2024

The Tool is likely splitting based on punctuation marks, it seems to be applying the case where the sentence ends with one of those punctuation marks, which in such cases isn't true.

The logic could be updated to check if the punctuation mark (!, ., ?) is within a quotation.

@MVARUNREDDY8203 MVARUNREDDY8203 linked a pull request Nov 10, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants