Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555

nlohmann · 2024-12-18T08:55:13Z

Add an enumerator keep to error_handler_t to allow to keep the input as-is in case of UTF-8 errors.

Fixes #4552

github-actions · 2024-12-18T09:40:48Z

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @nlohmann
Please read and follow the Contribution Guidelines.

coveralls · 2024-12-18T09:41:15Z

coverage: 99.639%. remained the same
when pulling 4ab98c3 on issue4552-ignore
into 663058e on develop.

* 🐛 set parents after insert call * 🚨 fix warning

* Add implementation to retrieve start and end positions of json during parse * Add more unit tests and add start/stop parsing for arrays * Add raw value for all types * Add more tests and fix compiler warning * Amalgamate * Fix CLang GCC warnings * Fix error in build * Style using astyle 3.1 * Fix whitespace changes * revert * more whitespace reverts * Address PR comments * Fix failing issues * More whitespace reverts * Address remaining PR comments * Address comments * Switch to using custom base class instead of default basic_json * Adding a basic using for a json using the new base class. Also address PR comments and fix CI failures * Address decltype comments * Diagnostic positions macro (#4) Co-authored-by: Sush Shringarputale <[email protected]> * Fix missed include deletion * Add docs and address other PR comments (#5) * Add docs and address other PR comments --------- Co-authored-by: Sush Shringarputale <[email protected]> * Address new PR comments and fix CI tests for documentation * Update documentation based on feedback (#6) --------- Co-authored-by: Sush Shringarputale <[email protected]> * Address std::size_t and other comments * Fix new CI issues * Fix lcov * Improve lcov case with update to handle_diagnostic_positions call for discarded values * Fix indentation of LCOV_EXCL_STOP comments * fix amalgamation astyle issue --------- Co-authored-by: Sush Shringarputale <[email protected]>

jordan-hoang · 2024-12-23T06:36:55Z

It looks good to me. I thought you would do json_string === s_kept, but you took it's substring at position 1 instead, I guess dump gives you an extra character in the beginning or something?

nlohmann · 2024-12-23T06:58:59Z

Yes, dump adds quotes.

docs/mkdocs/docs/api/basic_json/dump.md

gentooise · 2024-12-23T08:28:43Z

include/nlohmann/detail/output/serializer.hpp

+        // copy string as-is if error handler is set to keep, and we don't want to ensure ASCII
+        if (error_handler == error_handler_t::keep && !ensure_ascii)
+        {
+            o->write_characters(s.data(), s.size());


Just for me to understand, how would this behave exactly? If there is a 0×dc byte for example, it will be escaped as \334 octal string (or \xdc hex, or similar)?

I think the important thing is to not break the json format.

And also what about other UTF-8 accepted chars? Like \b or \t handled below: how will they be dumped in this case?

I have limited access these days (from mobile), and I don't know exactly the purpose of ensure_ascii and its default value. If you could provide some hint it would be helpful.

Thank you

Oh, you're right! Just copying the input to the output is wrong here, because valid characters like LF that must be escaped to \n would not be escaped and the resulting JSON would be invalid. Thanks for noting. I will fix this.

Co-authored-by: gentooise <[email protected]>

github-actions · 2025-02-14T00:11:08Z

This pull request has been marked as stale because it has had no activity for 30 days. While we won’t close it automatically, we encourage you to update or comment if it is still relevant. Keeping pull requests active and up-to-date helps us review and merge changes more efficiently. Thank you for your contributions!

gregmarr · 2025-02-14T18:59:52Z

@nlohmann Does this still need work?

nlohmann · 2025-02-14T19:16:44Z

Yes. And clarification.

🚧 WIP for #4552

4d67e12

github-actions bot added M tests labels Dec 18, 2024

nlohmann and others added 7 commits December 18, 2024 17:46

Set parents after insert call (#4537)

851584e

* 🐛 set parents after insert call * 🚨 fix warning

Suppress modernize-use-integer-sign-comparison (#4558)

3665dab

🎨 fix format

1a76a2c

Bump actions/upload-artifact from 4.4.3 to 4.5.0 (#4557)

3db5cc4

Add ONLY_SERIALIZE for NLOHMANN_DEFINE_DERIVED_TYPE_* macros (#4562)

a27a5b5

🚧 first implementation for keep

a2d828c

github-actions bot added documentation L CI CMake and removed M labels Dec 20, 2024

nlohmann added 2 commits December 20, 2024 16:36

🚨 fix warning

7d2a83b

Merge branch 'develop' into issue4552-ignore

a6a06b7

github-actions bot added M and removed L CI CMake labels Dec 22, 2024

🚧 add support for ensure_ascii

3cd5025

github-actions bot added L and removed M labels Dec 22, 2024

nlohmann added 3 commits December 22, 2024 13:30

🚨 fix warnings

15ff370

🎨 format code

e9876d9

📝 clean up

b167096

nlohmann changed the title ~~WIP for #4552~~ Add keep parameter to dump to copy invalid UTF-8 bytes as-is Dec 22, 2024

nlohmann marked this pull request as ready for review December 22, 2024 13:21

nlohmann added the review needed It would be great if someone could review the proposed changes. label Dec 22, 2024

nlohmann mentioned this pull request Dec 22, 2024

UTF-8 invalid characters are not always ignored when dumping with error_handler_t::ignore #4552

Open

2 tasks

🚨 fix warnings

493d1e4

gentooise reviewed Dec 23, 2024

View reviewed changes

Update docs/mkdocs/docs/api/basic_json/dump.md

4ab98c3

Co-authored-by: gentooise <[email protected]>

nlohmann removed the review needed It would be great if someone could review the proposed changes. label Dec 23, 2024

nlohmann marked this pull request as draft December 23, 2024 11:41

github-actions bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555

Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555

Uh oh!

nlohmann commented Dec 18, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 18, 2024

Uh oh!

coveralls commented Dec 18, 2024 •

edited

Loading

Uh oh!

jordan-hoang commented Dec 23, 2024

Uh oh!

nlohmann commented Dec 23, 2024

Uh oh!

Uh oh!

gentooise Dec 23, 2024

Uh oh!

nlohmann Dec 23, 2024

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

gregmarr commented Feb 14, 2025

Uh oh!

nlohmann commented Feb 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555

Are you sure you want to change the base?

Add keep parameter to dump to copy invalid UTF-8 bytes as-is #4555

Uh oh!

Conversation

nlohmann commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2024

🔴 Amalgamation check failed! 🔴

Uh oh!

coveralls commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordan-hoang commented Dec 23, 2024

Uh oh!

nlohmann commented Dec 23, 2024

Uh oh!

Uh oh!

gentooise Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

nlohmann Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 14, 2025

Uh oh!

gregmarr commented Feb 14, 2025

Uh oh!

nlohmann commented Feb 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

nlohmann commented Dec 18, 2024 •

edited

Loading

coveralls commented Dec 18, 2024 •

edited

Loading