Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BJData optimized binary array type #4513

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

Conversation

nebkat
Copy link

@nebkat nebkat commented Nov 25, 2024

See NeuroJSON/bjdata#6 for further information.

Introduces a dedicated B marker for bytes.

This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of wrongly decoding as an integer array).

Draft while awaiting the release of BJData draft 3.

Would a legacy_binary flag be desirable to continue supporting draft 2 expectations of uint8 typed arrays for binary?


Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

@coveralls
Copy link

coveralls commented Nov 25, 2024

Coverage Status

coverage: 99.649%. remained the same
when pulling 7f12cd6 on nebkat:develop
into a006a7a on nlohmann:develop.

Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @nebkat
Please read and follow the Contribution Guidelines.

@nlohmann
Copy link
Owner

Please run make amalgamate with AStyle 3.1.

Introduces a dedicated `B` marker for bytes. This is used as the strong
type marker in optimized array format to encode binary data such that
it can also be decoded back to binary data (instead of decoding as an
integer array).

See NeuroJSON/bjdata#6 for further information.
@nebkat nebkat force-pushed the develop branch 5 times, most recently from 516cf0b to d9b1035 Compare December 5, 2024 01:52
@nebkat
Copy link
Author

nebkat commented Dec 5, 2024

draft3_binary parameter added to to_bjdata, defaulting to false, along with tests for both versions.

from_bjdata previously wasn't able to parse uint8 binary arrays so it has just gained the ability to parse byte arrays as binary with no parameter.

Would it make sense to introduce a to_bjdata_draft3() function and deprecate to_bjdata() to encourage use of the newer version?

@nlohmann
Copy link
Owner

nlohmann commented Dec 5, 2024

Thanks a lot! Feels much better without a breaking change!

Regarding the draft version - would it make sense to add an int parameter draft to the function and defaulting it to 2 and allow to set it to 3 instead of the boolean? Then we have some room for the future there.

Regarding deprecation - adding a new function to the API just for the sake of deprecating another feels odd. I think we should rather deprecate the default parameter for the mentioned draft parameter and force client to choose the value in the future.

What do you think?

@nebkat
Copy link
Author

nebkat commented Dec 5, 2024

I like it, didn't think of that, much better solution!

So perhaps something like this?

/// how to encode BJData
enum class bjdata_version_t
{
    draft2 JSON_HEDLEY_DEPRECATED_FOR(3.12.0, draft3),
    draft3,
};

...

static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
        const bool use_size = false,
        const bool use_type = false,
        const bjdata_version_t version = bjdata_version_t::draft2)
{
    std::vector<std::uint8_t> result;
    to_bjdata(j, result, use_size, use_type, version);
    return result;
}

// warning: 'nlohmann::json_abi_v3_11_3::detail::bjdata_version_t::draft2' is deprecated: Since 3.12.0; use draft3 [-Wdeprecated-declarations]
// 4315 |             const bjdata_version_t version = bjdata_version_t::draft2)

Could technically even put ubjson as a version as they are so close and use it internally.

@nlohmann
Copy link
Owner

Could technically even put ubjson as a version as they are so close and use it internally.

You mean using BJData as parameter for UBJSON output/input?

@nebkat
Copy link
Author

nebkat commented Dec 10, 2024

Yes, internally.

json/include/nlohmann/detail/output/binary_writer.hpp:

    // json/include/nlohmann/detail/output/binary_writer.hpp
    void write_ubjson(const BasicJsonType& j, const bool use_count,
                      const bool use_type, const bool add_prefix = true,
-                      const bool use_bjdata = false)
+                      const bjdata_version_t version = bjdata_version_t::ubjson) // or "draft0", which could be argued is equivalent
    {

Then we can do:

include/nlohmann/json.hpp:

    /// @brief create a UBJSON serialization of a given JSON value
    /// @sa https://json.nlohmann.me/api/basic_json/to_ubjson/
    static void to_ubjson(const basic_json& j, detail::output_adapter<std::uint8_t> o,
                          const bool use_size = false, const bool use_type = false)
    {
-        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type);
+         binary_writer<std::uint8_t>(o).write_bjdata(j, use_size, use_type, bjdata_version_t::ubjson); // Or defaulted as above
    }

    /// @brief create a BJData serialization of a given JSON value
    /// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
    static void to_bjdata(const basic_json& j, detail::output_adapter<std::uint8_t> o,
-                         const bool use_size = false, const bool use_type = false)
+                         const bool use_size = false, const bool use_type = false, const bjdata_version_t version = bjdata_version_t::draft2)
    {
-        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, true);
+        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, version);
    }

Would cut down on the amount of if (use_bjdata && bjdata_version == ...) and instead just be if (version == ...).

@nlohmann
Copy link
Owner

Yes, what I meant: would this be something that could - in the long run - be exposed to the customer? As in: deprecating to_bjdata in favor of to_ubjson with a version parameter.

@nebkat
Copy link
Author

nebkat commented Dec 10, 2024

Ah right - yes, but if anything I would deprecate to_ubjson in favour of to_bjdata considering UBJSON has not seen any meaningful updates in ~8 years. If we can play our part in moving people towards BJData that seems like a positive thing in the long run! It is the closest thing we have to a 1-to-1 binary mapping of JSON without the extra fluff.

@nlohmann
Copy link
Owner

Well, I thought of BJData to be a dialect of UBJSON, so I don't think replacing the UBJSON with those of BJData is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants