remove single quotes around words while preserving apostrophes #159

ejdweck · 2018-11-04T23:09:43Z

I was using the sentiment library and noticed when I ran analysis on headlines that utilized single quotes, the words were not being properly tokenized.

For example, for the news headline from cnn.com that reads:

Abrams: Trump is 'wrong,' I am qualified to be Georgia's governor

wrong should be tokenized from 'wrong' to wrong.

In its current state, the library successfully tokenizes words from double quotes but not from single quotes (my guess is to preserve apostrophes - if you add an ' to the .replace regex, all single quotes would be removed).

Here is some code to reproduce error:

var Sentiment = require('sentiment');
var sentiment = new Sentiment();

let noQuotes = "Abrams: Trump is wrong, I am qualified to be Georgia's governor";
let singleQuotes = "Abrams: Trump is \'wrong\', I am qualified to be Georgia's governor";
let doubleQuotes = "Abrams: Trump is \"wrong,\" I am qualified to be Georgia's governor"

let noQuotesResult = sentiment.analyze(noQuotes);
var doubleQuotesResult = sentiment.analyze(doubleQuotes);
var singleQuotesResult = sentiment.analyze(singleQuotes);

console.log(noQuotesResult);
console.log(doubleQuotesResult);
console.log(singleQuotesResult);

{ score: -2,
  comparative: -0.18181818181818182,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     'wrong',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [ 'wrong' ],
  positive: [],
  negative: [ 'wrong' ] }
{ score: -2,
  comparative: -0.18181818181818182,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     'wrong',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [ 'wrong' ],
  positive: [],
  negative: [ 'wrong' ] }
{ score: 0,
  comparative: 0,
  tokens:
   [ 'abrams',
     'trump',
     'is',
     '\'wrong\'',
     'i',
     'am',
     'qualified',
     'to',
     'be',
     'georgia\'s',
     'governor' ],
  words: [],
  positive: [],
  negative: [] }

…strophes + 3 unit tests to verify code works as expected.

added logic to remove single quotes around words while preserving apo…

355237b

…strophes + 3 unit tests to verify code works as expected.

thisandagain added the pr - needs review label Aug 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove single quotes around words while preserving apostrophes #159

remove single quotes around words while preserving apostrophes #159

ejdweck commented Nov 4, 2018

remove single quotes around words while preserving apostrophes #159

Are you sure you want to change the base?

remove single quotes around words while preserving apostrophes #159

Conversation

ejdweck commented Nov 4, 2018