Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlatBuffers 64 for C++ #7935

Merged
merged 62 commits into from
May 9, 2023
Merged

FlatBuffers 64 for C++ #7935

merged 62 commits into from
May 9, 2023

Conversation

dbaileychess
Copy link
Collaborator

This introduces 64-bit FlatBuffers.

This allows buffers to be larger than 2 GiB limit due to the addressable range of the uoffset_t (aka uint32_t) used. This add a uoffset64_t (aka uint64_t) as a possible offset backing type, allowing the addressable range to be much larger.

Overview

The buffer is now conceptually two regions of contiguous memory:

[           binary           ]
[32-bit region][64-bit region]

Where the 32-bit region was historically the whole of the FlatBuffer. All 32-bit offsets (Offset) are relative to the end of the 32-bit region, and thus can only address objects within that region. The new 64-bit offset (Offset64) is relative to end of the 64-bit region (or conceptually the tail of the whole buffer). So Offset64 can address any object within the buffer.

This leads to an important concept for using 64-bit FlatBuffers:

All 64-bit offsets MUST be serialized to the binary first, before adding any 32-bit offsets

Attempting otherwise will lead to an assertion.

Schema

Two new attributes are added:

  • offset64

    This will generate methods to produce Offset64 return types. This can be used on strings and vectors.

  • vector64

    This implies offset64 but also expands the type of the length field of a vector from 32-bits to 64-bits. This allows you to store a single large vector that is > 2 GiB. This also works with nested_flatbuffer attributes.

Code Generation

Only C++ code generation is supported at the moment.

For the most part, this code is semantically similar to the 32-bit version, it just switches out some types. It also requires use of the new FlatBufferBuilder64 to handle the larger buffer.

Builder

There is a new FlatBufferBuilder64 that is used to build these large buffers. They have various methods to create the supported 64-bit enabled types: vectors and strings (sorry no tables yet).

Semantically the builder is the same as the 32-bit one (its just a templated version of it), so the API and flow of building will be identical. The only difference is the inclusion of more template parameters to dictate the use Offset64 or Offset.

For example there are now:

FlatBufferBuilder64 builder;
Offset<T> offset = builder.CreateVector(T t); // create a normal 32-bit vector with 32-bit length field.
Offset64<T> offset = builder.CreateVector<Offset64>(T t); // create a 64-bit vector with 32-bit length field,
Offset64<T> offset = builder.CreateVector<Offset64, Vector64>(T t); // create a 64-bit vector with 64-bit length field.

// Same with strings
Offset<String> offset = builder.CreateString("hi");
Offset64<String> offset = builder.CreateString<Offset64>("hi");

Accessing

Accessing a 64-bit FlatBuffer is almost identical to the current ways. Only the returned types differ a bit for vectors, where now there is a Vector64<T> (which is just a Vector<T, uoffset64_t>). So it will have identical API, just operates on different length types.

Compatibility

Adding either offset64 or vector64 to an existing field is an evolution error (it would fail the compatibility check), so backwards compatibility is preserved, as any 64-bit field would have to be a new field.

Implementation Notes

Here is what an annotated binary looks for a example schema that uses various 64-bit fields.

Primarily the difference is the support for UOffset64 in the table definition. The associated vtable doesn't need special treatment since it natively supported various offsets.

root_table (RootTable):
  +0x1C | 14 00 00 00             | SOffset32  | 0x00000014 (20) Loc: 0x08          | offset to vtable
  +0x20 | D0 00 00 00 00 00 00 00 | UOffset64  | 0x00000000000000D0 (208) Loc: 0xF0 | offset to field `far_vector` (vector)
  +0x28 | 00 00 00 00             | uint8_t[4] | ....                               | padding
  +0x2C | D2 04 00 00             | uint32_t   | 0x000004D2 (1234)                  | table field `a` (Int)
  +0x30 | 8C 00 00 00 00 00 00 00 | UOffset64  | 0x000000000000008C (140) Loc: 0xBC | offset to field `far_string` (string)
  +0x38 | 00 00 00 00             | uint8_t[4] | ....                               | padding
  +0x3C | 40 00 00 00             | UOffset32  | 0x00000040 (64) Loc: 0x7C          | offset to field `near_string` (string)
  +0x40 | 70 00 00 00 00 00 00 00 | UOffset64  | 0x0000000000000070 (112) Loc: 0xB0 | offset to field `big_vector` (vector64)
  +0x48 | 08 00 00 00 00 00 00 00 | UOffset64  | 0x0000000000000008 (8) Loc: 0x50   | offset to field `big_struct_vector` (vector64)

The other interesting case is the vector64 that now supports a uint64_t length field:

vector64 (RootTable.big_vector):
  +0xB0 | 04 00 00 00 00 00 00 00 | uint64_t   | 0x0000000000000004 (4)             | length of vector (# items)
  +0xB8 | 05                      | uint8_t    | 0x05 (5)                           | value[0]
  <2 regions omitted>
  +0xBB | 08                      | uint8_t    | 0x08 (8)                           | value[3]

Fixes: #7537

Copy link
Collaborator

@aardappel aardappel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Appreciate the great care you've taken to not disrupt the 32-bit eco system much :) Overall complexity surprisingly low too. Very neat!

include/flatbuffers/buffer.h Outdated Show resolved Hide resolved
include/flatbuffers/buffer.h Show resolved Hide resolved
include/flatbuffers/flatbuffer_builder.h Show resolved Hide resolved
include/flatbuffers/flatbuffer_builder.h Outdated Show resolved Hide resolved
src/idl_gen_cpp.cpp Show resolved Hide resolved
src/idl_parser.cpp Show resolved Hide resolved
src/idl_parser.cpp Show resolved Hide resolved
@battre
Copy link
Member

battre commented May 11, 2023

FYI, the tests are now really slow...

time ./flatbuffers_unittests
ALL TESTS PASSED

real    1m24.133s
user    1m23.377s
sys     0m0.752s

@aardappel
Copy link
Collaborator

@battre @dbaileychess odd, they used to be really fast, worth seeing which test that is..

@battre
Copy link
Member

battre commented May 11, 2023

I attached a debugger because I thought that the tests were in an infinite loop. When I triggered a break, I was in the resize here:

    std::vector<uint8_t> big_data;
    big_data.resize(big_vector_size);

@dbaileychess
Copy link
Collaborator Author

Yeah, I actually made a giant buffer to test and it takes a while to make. Let me fix it so the average case doesn't have to do it.

@dbaileychess
Copy link
Collaborator Author

Fixed in 66e9d98

Before:

[I] derekbailey@lysine ~/P/d/flatbuffers (master)> time ./flattests
ALL TESTS PASSED

________________________________________________________
Executed in    4.44 secs    fish           external
   usr time    1.31 secs    1.35 millis    1.31 secs
   sys time    3.11 secs    0.40 millis    3.11 secs

After:

[I] derekbailey@lysine ~/P/d/flatbuffers (master)> time ./flattests
ALL TESTS PASSED

________________________________________________________
Executed in  121.91 millis    fish           external
   usr time   93.57 millis    1.43 millis   92.14 millis
   sys time   28.39 millis    0.35 millis   28.04 millis

I guess my machine was beefy enough that the 100 ms to 4 s wasn't too bad that I noticed.

@battre
Copy link
Member

battre commented May 11, 2023

Hm... My machine that needed 1m24.133s has 128 virtual cores and 512 GB of RAM :-) - But it's a virtual machine and I compiled it within a Chrome checkout. I wonder whether Chrome has special parameters for the memory allocator...

@battre
Copy link
Member

battre commented May 11, 2023

That helped here as well!

time ./flatbuffers_unittests
ALL TESTS PASSED

real    0m0.236s
user    0m0.228s
sys     0m0.008s

Thank you.

@dbaileychess dbaileychess deleted the flatbuffers-64 branch May 11, 2023 19:17
jochenparm pushed a commit to jochenparm/flatbuffers that referenced this pull request Oct 29, 2024
* First working hack of adding 64-bit. Don't judge :)

* Made vector_downward work on 64 bit types

* vector_downward uses size_t, added offset64 to reflection

* cleaned up adding offset64 in parser

* Add C++ testing skeleton for 64-bit

* working test for CreateVector64

* working >2 GiB buffers

* support for large strings

* simplified CreateString<> to just provide the offset type

* generalize CreateVector template

* update test_64.afb due to upstream format change

* Added Vector64 type, which is just an alias for vector ATM

* Switch to Offset64 for Vector64

* Update for reflection bfbs output change

* Starting to add support for vector64 type in C++

* made a generic CreateVector that can handle different offsets and vector types

* Support for 32-vector with 64-addressing

* Vector64 basic builder + tests working

* basic support for json vector64 support

* renamed fields in test_64bit.fbs to better reflect their use

* working C++ vector64 builder

* Apply --annotate-sparse-vector to 64-bit tests

* Enable Vector64 for --annotate-sparse-vectors

* Merged from upstream

* Add `near_string` field for testing 32-bit offsets alongside

* keep track of where the 32-bit and 64-bit regions are for flatbufferbuilder

* move template<> outside class body for GCC

* update run.sh to build and run tests

* basic assertion for adding 64-bit offset at the wrong time

* started to separate `FlatBufferBuilder` into two classes, 1 64-bit aware, the other not

* add test for nested flatbuffer vector64, fix bug in alignment of big vectors

* fixed CreateDirect method by iterating by Offset64 first

* internal refactoring of flatbufferbuilder

* block not supported languages in the parser from using 64-bit

* evolution tests for adding a vector64 field

* conformity tests for adding/removing offset64 attributes

* ensure test is for a big buffer

* add parser error tests for `offset64` and `vector64` attributes

* add missing static that GCC only complains about

* remove stdint-uintn.h header that gets automatically added

* move 64-bit CalculateOffset internal

* fixed return size of EndVector

* various fixes on windows

* add SizeT to vector_downward

* minimze range of size changes in vector and builder

* reworked how tracking if 64-offsets are added

* Add ReturnT to EndVector

* small cleanups

* remove need for second Array definition

* combine IndirectHelpers into one definition

* started support for vector of struct

* Support for 32/64-vectors of structs + Offset64

* small cleanups

* add verification for vector64

* add sized prefix for 64-bit buffers

* add fuzzer for 64-bit

* add example of adding many vectors using a wrapper table

* run the new -bfbs-gen-embed logic on the 64-bit tests

* remove run.sh and fix cmakelist issue

* fixed bazel rules

* fixed some PR comments

* add 64-bit tests to cmakelist
jochenparm pushed a commit to jochenparm/flatbuffers that referenced this pull request Oct 29, 2024
* First working hack of adding 64-bit. Don't judge :)

* Made vector_downward work on 64 bit types

* vector_downward uses size_t, added offset64 to reflection

* cleaned up adding offset64 in parser

* Add C++ testing skeleton for 64-bit

* working test for CreateVector64

* working >2 GiB buffers

* support for large strings

* simplified CreateString<> to just provide the offset type

* generalize CreateVector template

* update test_64.afb due to upstream format change

* Added Vector64 type, which is just an alias for vector ATM

* Switch to Offset64 for Vector64

* Update for reflection bfbs output change

* Starting to add support for vector64 type in C++

* made a generic CreateVector that can handle different offsets and vector types

* Support for 32-vector with 64-addressing

* Vector64 basic builder + tests working

* basic support for json vector64 support

* renamed fields in test_64bit.fbs to better reflect their use

* working C++ vector64 builder

* Apply --annotate-sparse-vector to 64-bit tests

* Enable Vector64 for --annotate-sparse-vectors

* Merged from upstream

* Add `near_string` field for testing 32-bit offsets alongside

* keep track of where the 32-bit and 64-bit regions are for flatbufferbuilder

* move template<> outside class body for GCC

* update run.sh to build and run tests

* basic assertion for adding 64-bit offset at the wrong time

* started to separate `FlatBufferBuilder` into two classes, 1 64-bit aware, the other not

* add test for nested flatbuffer vector64, fix bug in alignment of big vectors

* fixed CreateDirect method by iterating by Offset64 first

* internal refactoring of flatbufferbuilder

* block not supported languages in the parser from using 64-bit

* evolution tests for adding a vector64 field

* conformity tests for adding/removing offset64 attributes

* ensure test is for a big buffer

* add parser error tests for `offset64` and `vector64` attributes

* add missing static that GCC only complains about

* remove stdint-uintn.h header that gets automatically added

* move 64-bit CalculateOffset internal

* fixed return size of EndVector

* various fixes on windows

* add SizeT to vector_downward

* minimze range of size changes in vector and builder

* reworked how tracking if 64-offsets are added

* Add ReturnT to EndVector

* small cleanups

* remove need for second Array definition

* combine IndirectHelpers into one definition

* started support for vector of struct

* Support for 32/64-vectors of structs + Offset64

* small cleanups

* add verification for vector64

* add sized prefix for 64-bit buffers

* add fuzzer for 64-bit

* add example of adding many vectors using a wrapper table

* run the new -bfbs-gen-embed logic on the 64-bit tests

* remove run.sh and fix cmakelist issue

* fixed bazel rules

* fixed some PR comments

* add 64-bit tests to cmakelist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ codegen Involving generating code from schema java json php python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible design for 64-bit sized buffer support in FlatBuffers
3 participants