Skip to content

Quoted header values containing commas and comprised of the same string aren't able to be parsed. #1052

Open
@alliefitter

Description

@alliefitter

This is a pretty weird corner case, so let me know if y'all need more detail. Given a sheet with headers named "Bar, Baz" and "Spam, Baz", after splitting the header row on ,, Papa will treat Baz" as duplicate header, and append _1 to the second instance of it in headerMap. Then while seemingly attempting to remediate duplicates, the second header value will become "Spam, Baz"_1, and seems to break parsing fields later on. The following scirpt...

import Papa from 'papaparse'

console.log(Papa.parse('Foo,"Bar, Baz","Spam, Baz",Some,Other,Headers\n1,2,3,4,5,6', { header: true }))

... will print...

{
  data: [],
  errors: [
    {
      type: 'Quotes',
      code: 'InvalidQuotes',
      message: 'Trailing quote on quoted field is malformed',
      row: 0,
      index: 16
    },
    {
      type: 'Quotes',
      code: 'MissingQuotes',
      message: 'Quoted field unterminated',
      row: 0,
      index: 16
    }
  ],
  meta: {
    delimiter: ',',
    linebreak: '\n',
    aborted: false,
    truncated: false,
    cursor: 57,
    fields: [
      'Foo',
      'Bar, Baz',
      'Spam, Baz"_1,Some,Other,Headers\n1,2,3,4,5,6'
    ]
  }
}

I was going to submit a PR, but the code is a bit difficult to follow. If this will take some time for y'all to get to, just comment here, and I can spend some time on a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions