-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555
base: develop
Are you sure you want to change the base?
Conversation
🔴 Amalgamation check failed! 🔴The source code has not been amalgamated. @nlohmann |
* 🐛 set parents after insert call * 🚨 fix warning
* Add implementation to retrieve start and end positions of json during parse * Add more unit tests and add start/stop parsing for arrays * Add raw value for all types * Add more tests and fix compiler warning * Amalgamate * Fix CLang GCC warnings * Fix error in build * Style using astyle 3.1 * Fix whitespace changes * revert * more whitespace reverts * Address PR comments * Fix failing issues * More whitespace reverts * Address remaining PR comments * Address comments * Switch to using custom base class instead of default basic_json * Adding a basic using for a json using the new base class. Also address PR comments and fix CI failures * Address decltype comments * Diagnostic positions macro (#4) Co-authored-by: Sush Shringarputale <[email protected]> * Fix missed include deletion * Add docs and address other PR comments (#5) * Add docs and address other PR comments --------- Co-authored-by: Sush Shringarputale <[email protected]> * Address new PR comments and fix CI tests for documentation * Update documentation based on feedback (#6) --------- Co-authored-by: Sush Shringarputale <[email protected]> * Address std::size_t and other comments * Fix new CI issues * Fix lcov * Improve lcov case with update to handle_diagnostic_positions call for discarded values * Fix indentation of LCOV_EXCL_STOP comments * fix amalgamation astyle issue --------- Co-authored-by: Sush Shringarputale <[email protected]>
It looks good to me. I thought you would do json_string === s_kept, but you took it's substring at position 1 instead, I guess dump gives you an extra character in the beginning or something? |
Yes, dump adds quotes. |
// copy string as-is if error handler is set to keep, and we don't want to ensure ASCII | ||
if (error_handler == error_handler_t::keep && !ensure_ascii) | ||
{ | ||
o->write_characters(s.data(), s.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for me to understand, how would this behave exactly? If there is a 0×dc
byte for example, it will be escaped as \334
octal string (or \xdc
hex, or similar)?
I think the important thing is to not break the json format.
And also what about other UTF-8 accepted chars? Like \b
or \t
handled below: how will they be dumped in this case?
I have limited access these days (from mobile), and I don't know exactly the purpose of ensure_ascii
and its default value. If you could provide some hint it would be helpful.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right! Just copying the input to the output is wrong here, because valid characters like LF that must be escaped to \n
would not be escaped and the resulting JSON would be invalid. Thanks for noting. I will fix this.
Co-authored-by: gentooise <[email protected]>
Add an enumerator
keep
toerror_handler_t
to allow to keep the input as-is in case of UTF-8 errors.Fixes #4552