-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlatBuffers 64 for C++ #7935
FlatBuffers 64 for C++ #7935
Conversation
ecdee46
to
4e7ce80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Appreciate the great care you've taken to not disrupt the 32-bit eco system much :) Overall complexity surprisingly low too. Very neat!
73b48fa
to
5c2d39a
Compare
5c2d39a
to
1ac0550
Compare
FYI, the tests are now really slow...
|
@battre @dbaileychess odd, they used to be really fast, worth seeing which test that is.. |
I attached a debugger because I thought that the tests were in an infinite loop. When I triggered a break, I was in the
|
Yeah, I actually made a giant buffer to test and it takes a while to make. Let me fix it so the average case doesn't have to do it. |
Fixed in 66e9d98 Before:
After:
I guess my machine was beefy enough that the 100 ms to 4 s wasn't too bad that I noticed. |
Hm... My machine that needed 1m24.133s has 128 virtual cores and 512 GB of RAM :-) - But it's a virtual machine and I compiled it within a Chrome checkout. I wonder whether Chrome has special parameters for the memory allocator... |
That helped here as well!
Thank you. |
* First working hack of adding 64-bit. Don't judge :) * Made vector_downward work on 64 bit types * vector_downward uses size_t, added offset64 to reflection * cleaned up adding offset64 in parser * Add C++ testing skeleton for 64-bit * working test for CreateVector64 * working >2 GiB buffers * support for large strings * simplified CreateString<> to just provide the offset type * generalize CreateVector template * update test_64.afb due to upstream format change * Added Vector64 type, which is just an alias for vector ATM * Switch to Offset64 for Vector64 * Update for reflection bfbs output change * Starting to add support for vector64 type in C++ * made a generic CreateVector that can handle different offsets and vector types * Support for 32-vector with 64-addressing * Vector64 basic builder + tests working * basic support for json vector64 support * renamed fields in test_64bit.fbs to better reflect their use * working C++ vector64 builder * Apply --annotate-sparse-vector to 64-bit tests * Enable Vector64 for --annotate-sparse-vectors * Merged from upstream * Add `near_string` field for testing 32-bit offsets alongside * keep track of where the 32-bit and 64-bit regions are for flatbufferbuilder * move template<> outside class body for GCC * update run.sh to build and run tests * basic assertion for adding 64-bit offset at the wrong time * started to separate `FlatBufferBuilder` into two classes, 1 64-bit aware, the other not * add test for nested flatbuffer vector64, fix bug in alignment of big vectors * fixed CreateDirect method by iterating by Offset64 first * internal refactoring of flatbufferbuilder * block not supported languages in the parser from using 64-bit * evolution tests for adding a vector64 field * conformity tests for adding/removing offset64 attributes * ensure test is for a big buffer * add parser error tests for `offset64` and `vector64` attributes * add missing static that GCC only complains about * remove stdint-uintn.h header that gets automatically added * move 64-bit CalculateOffset internal * fixed return size of EndVector * various fixes on windows * add SizeT to vector_downward * minimze range of size changes in vector and builder * reworked how tracking if 64-offsets are added * Add ReturnT to EndVector * small cleanups * remove need for second Array definition * combine IndirectHelpers into one definition * started support for vector of struct * Support for 32/64-vectors of structs + Offset64 * small cleanups * add verification for vector64 * add sized prefix for 64-bit buffers * add fuzzer for 64-bit * add example of adding many vectors using a wrapper table * run the new -bfbs-gen-embed logic on the 64-bit tests * remove run.sh and fix cmakelist issue * fixed bazel rules * fixed some PR comments * add 64-bit tests to cmakelist
* First working hack of adding 64-bit. Don't judge :) * Made vector_downward work on 64 bit types * vector_downward uses size_t, added offset64 to reflection * cleaned up adding offset64 in parser * Add C++ testing skeleton for 64-bit * working test for CreateVector64 * working >2 GiB buffers * support for large strings * simplified CreateString<> to just provide the offset type * generalize CreateVector template * update test_64.afb due to upstream format change * Added Vector64 type, which is just an alias for vector ATM * Switch to Offset64 for Vector64 * Update for reflection bfbs output change * Starting to add support for vector64 type in C++ * made a generic CreateVector that can handle different offsets and vector types * Support for 32-vector with 64-addressing * Vector64 basic builder + tests working * basic support for json vector64 support * renamed fields in test_64bit.fbs to better reflect their use * working C++ vector64 builder * Apply --annotate-sparse-vector to 64-bit tests * Enable Vector64 for --annotate-sparse-vectors * Merged from upstream * Add `near_string` field for testing 32-bit offsets alongside * keep track of where the 32-bit and 64-bit regions are for flatbufferbuilder * move template<> outside class body for GCC * update run.sh to build and run tests * basic assertion for adding 64-bit offset at the wrong time * started to separate `FlatBufferBuilder` into two classes, 1 64-bit aware, the other not * add test for nested flatbuffer vector64, fix bug in alignment of big vectors * fixed CreateDirect method by iterating by Offset64 first * internal refactoring of flatbufferbuilder * block not supported languages in the parser from using 64-bit * evolution tests for adding a vector64 field * conformity tests for adding/removing offset64 attributes * ensure test is for a big buffer * add parser error tests for `offset64` and `vector64` attributes * add missing static that GCC only complains about * remove stdint-uintn.h header that gets automatically added * move 64-bit CalculateOffset internal * fixed return size of EndVector * various fixes on windows * add SizeT to vector_downward * minimze range of size changes in vector and builder * reworked how tracking if 64-offsets are added * Add ReturnT to EndVector * small cleanups * remove need for second Array definition * combine IndirectHelpers into one definition * started support for vector of struct * Support for 32/64-vectors of structs + Offset64 * small cleanups * add verification for vector64 * add sized prefix for 64-bit buffers * add fuzzer for 64-bit * add example of adding many vectors using a wrapper table * run the new -bfbs-gen-embed logic on the 64-bit tests * remove run.sh and fix cmakelist issue * fixed bazel rules * fixed some PR comments * add 64-bit tests to cmakelist
This introduces 64-bit FlatBuffers.
This allows buffers to be larger than 2 GiB limit due to the addressable range of the
uoffset_t
(akauint32_t
) used. This add auoffset64_t
(akauint64_t
) as a possible offset backing type, allowing the addressable range to be much larger.Overview
The buffer is now conceptually two regions of contiguous memory:
Where the
32-bit region
was historically the whole of the FlatBuffer. All 32-bit offsets (Offset
) are relative to the end of the 32-bit region, and thus can only address objects within that region. The new 64-bit offset (Offset64
) is relative to end of the 64-bit region (or conceptually the tail of the whole buffer). SoOffset64
can address any object within the buffer.This leads to an important concept for using 64-bit FlatBuffers:
All 64-bit offsets MUST be serialized to the binary first, before adding any 32-bit offsets
Attempting otherwise will lead to an assertion.
Schema
Two new attributes are added:
offset64
This will generate methods to produce
Offset64
return types. This can be used onstrings
andvectors
.vector64
This implies
offset64
but also expands the type of the length field of a vector from 32-bits to 64-bits. This allows you to store a single large vector that is > 2 GiB. This also works withnested_flatbuffer
attributes.Code Generation
Only
C++
code generation is supported at the moment.For the most part, this code is semantically similar to the 32-bit version, it just switches out some types. It also requires use of the new
FlatBufferBuilder64
to handle the larger buffer.Builder
There is a new
FlatBufferBuilder64
that is used to build these large buffers. They have various methods to create the supported 64-bit enabled types:vectors
andstrings
(sorry notables
yet).Semantically the builder is the same as the 32-bit one (its just a templated version of it), so the API and flow of building will be identical. The only difference is the inclusion of more template parameters to dictate the use
Offset64
orOffset
.For example there are now:
Accessing
Accessing a 64-bit FlatBuffer is almost identical to the current ways. Only the returned types differ a bit for vectors, where now there is a
Vector64<T>
(which is just aVector<T, uoffset64_t>
). So it will have identical API, just operates on different length types.Compatibility
Adding either
offset64
orvector64
to an existing field is an evolution error (it would fail the compatibility check), so backwards compatibility is preserved, as any 64-bit field would have to be a new field.Implementation Notes
Here is what an annotated binary looks for a example schema that uses various 64-bit fields.
Primarily the difference is the support for
UOffset64
in the table definition. The associatedvtable
doesn't need special treatment since it natively supported various offsets.The other interesting case is the
vector64
that now supports a uint64_t length field:Fixes: #7537