Skip to content

Conversation

@thomas-boucher
Copy link

See the description of the issue in #2345.

This proposal avoids allocating a new string with the current row on each invalid field if the user didn't define the "bad data" callback.
As an example, this reduced the time to parse a 3.5M file with 20k lines from 16 secs to 61 ms (see an example of how to reproduce in the issue).

I didn't add tests because the callback is already tested.

To go further and improve even if the callback is defined, I guess the RawRecord could be filled when a line is considered ended (and if the callback is defined) to avoid a new allocation on each field.

@thomas-boucher
Copy link
Author

hello @JoshClose, can I help to move this forward? Thanks!

@JoshClose
Copy link
Owner

I rewrote the parser from scratch and am currently integrating it back into the system, which may make this invalid. I'll leave this hear to remember to check and make sure the same thing doesn't happen again.

@thomas-boucher
Copy link
Author

I rewrote the parser from scratch and am currently integrating it back into the system, which may make this invalid. I'll leave this hear to remember to check and make sure the same thing doesn't happen again.

Thanks @JoshClose, do you have a rough idea on a estimated readiness for the new implementation? Otherwise would it make sense to merge this in a minor update of the current code since it seems very low risk?
Thanks for the support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants