Skip to content

Document and improve abstract reader/writer interface #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

sjlongland
Copy link

tinycbor fork branch thiagomacieira/dev adds support for abstract readers and writers, but the implementation had some limitations in cases where multiple CborValue cursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.

This documents the new interface and further builds upon it by slightly re-arranging CborParser members and opening the token in CborValue to use by the reader interface for any required purpose.

TSonono and others added 7 commits November 20, 2019 12:33
This method allows for writing raw data directly to the encoding buffer. This can be useful if you have something stored as CBOR encoded data.

Fixes intel#162.

Signed-off-by: Tofik Sonono <[email protected]>
printf s.b. used for indentation because no newline at end is needed

Signed-off-by: phirsov <[email protected]>
bytestring got escaped hex string literal output format to not to confuse e.g. "\x11" bytestring with 11 integer

Signed-off-by: phirsov <[email protected]>
string got plain old C output format to avoid confusion e.g. "true" string with CBOR true value

Signed-off-by: phirsov <[email protected]>
@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 5d24326 to 64299b9 Compare July 7, 2021 08:35
mcr and others added 22 commits September 3, 2021 11:27
…mes were unclear. containerEncoder could be the containing Encoder
…nt data as single float

Motivation: half-precision floating point format is used to minimize storage
and traffic mostly.  Application level manipulates with single and double
precision usually. So, two routines added to public API to encode/decode given
single precision value in the half precision format

Signed-off-by: S.Phirsov
Signed-off-by: Thiago Macieira <[email protected]>
The listing was outdated.

Signed-off-by: Thiago Macieira <[email protected]>
The last version of Qt to support MSVC 2015 is no longer maintained, so
I'm skipping that. I'm therefore rededicating MSVC 2017 for 32-bit.

There's also no more no-tests build.
We don't need to compare the map lengths directly. The memcmp is
sufficient, since the source data is big endian.

Of course, verifying for sorting requires the map has known length.

Signed-off-by: Thiago Macieira <[email protected]>
Because of all the inline functions, #include'ing <cbor.h> without
linking in all .c files may result in undefined symbol linker errors.

Signed-off-by: Thiago Macieira <[email protected]>
QCOMPARE macro has a return in case of failure. If a test fails inside
the encodeOne function, we log that error, but were proceeding to
perform more tests (which could fail again and produce more errors).

Signed-off-by: Thiago Macieira <[email protected]>
As noted in the comment, we need to be sure we don't allow a length too
big from the stream to overflow and become smaller than the number of
bytes we're looking for.

This is also the first step in creating an API that reads from something
other than a linear buffer.

Signed-off-by: Thiago Macieira <[email protected]>
This commit does not change all the readers yet, this is just the first
step.

As an interesting side-effect, we ended up reading the half-float into
it->extra and need not re-read it. The cbor_value_get_half_float()
function can be inlined in a later commit.

Signed-off-by: Thiago Macieira <[email protected]>
We don't need to skimp on bits in the CborValue::flags, so save the fact
that the preparser found a 64-bit number in there. This saves us from
having to re-read the descriptor byte again in
_cbor_value_decode_int64_internal().

Signed-off-by: Thiago Macieira <[email protected]>
It was originally non-inline because I had thought of doing conversions
from half-float to float and onwards to double, but I never actually
made that in the API. Instead, even the get_float() and get_double(), we
only memcpy anyway and leave it up to the upper layer to convert, as
needed.

This change triggered an use-when-uninitialised false positive warning
that I needed to work around in the validator.

Signed-off-by: Thiago Macieira <[email protected]>
The extract_length() function is only used in string context, so rename
accordingly.

Signed-off-by: Thiago Macieira <[email protected]>
The extract_length() was the only case that called
extract_number_and_advance() without verifying we had a proper number. This
commit reworks the implementation so extract_number_and_advance() reuses
the number previously read by preparse_value(), and extract_length() gets
inlined to the only place that uses it.

Signed-off-by: Thiago Macieira <[email protected]>
thiagomacieira and others added 23 commits September 3, 2021 13:08
Instead of consuming it at the end of the last element of a map or array
of unknown length. This allows us to obtain the pointer to or offset of
the Break byte.

Signed-off-by: Thiago Macieira <[email protected]>
We need to re-parse if the input buffer was too short to read the
current element's information. When that happens, the current element
will be CborInvalidType, so we can't easily resume.

Signed-off-by: Thiago Macieira <[email protected]>
Instead of just one function (_cbor_value_get_string_chunk), we now have
_cbor_value_begin_string_iteration, _cbor_value_finish_string_iteration,
_cbor_value_get_string_chunk_size, and _cbor_value_get_string_chunk.

The "begin" function positions the pointer at the first chunk. That's
what makes "get_size" possible, since it doesn't need to check for any
state. The "finish" funcntion allows the caller to distinguish an error
parsing the string from an error parsing the next value.

Signed-off-by: Thiago Macieira <[email protected]>
Instead of just one function (_cbor_value_get_string_chunk), we now have
_cbor_value_begin_string_iteration, _cbor_value_finish_string_iteration,
_cbor_value_get_string_chunk_size, and _cbor_value_get_string_chunk.

The "begin" function positions the pointer at the first chunk. That's
what makes "get_size" possible, since it doesn't need to check for any
state. The "finish" funcntion allows the caller to distinguish an error
parsing the string from an error parsing the next value.

Signed-off-by: Thiago Macieira <[email protected]>
`cbor_parser_init` and `cbor_parser_init_reader` are substantially
similar, however the latter misses clearing out `it->flags`, leaving it
uninitialised so possibly unsafe.

Rather than copying & pasting that from `cbor_parser_init`, lets just
use one routine that does the "common" part, then each routine can focus
on the specifics needed.
Describe the input parameters for the function and how they are used as
best we understand from on-paper analysis of the C code.
The `token` parameter is not sufficient since it is effectively shared
by all `CborValue` instances.  Since `tinycbor` often uses a temporary
`CborValue` context to perform some operation, we need to store our
context inside that `CborValue` so that we don't pollute the global
state of the reader.
In its place, put an arbitrary `void *` pointer for reader context.  The
reader needs to store some context information which is specific to the
`CborParser` instance it is serving.  Right now, `CborValue::source::token`
serves this purpose, but the problem is that we also need a
per-`CborValue` context and have nowhere to put it.

Better to spend an extra pointer (4 bytes on 32-bit platforms) in the
`CborParser` (which there'll be just one of), then to do it in the
`CborValue` (which there may be several of) or to use a `CborReader`
object that itself carries two pointers (`ops` and the context, thus
we'd need an extra 3 pointers).
We simplify this reader in two ways:
1. we remove the `consumed` member of `struct Input`, and instead use
   the `CborValue`'s `source.token` member, which we treat as an
   unsigned integer offset into our `QByteArray`.
2. we replace the reader-specific `struct Input` with the `QByteArray`
   it was wrapping, since that's the only thing now contained in our
   `struct Input`.

If a `CborValue` gets cloned, the pointer referred to by `source.token`
similarly gets cloned, thus when we advance the pointer on the clone, it
leaves the original alone, so computing the length of unknown-length
entities in the CBOR document can be done safely.
What is not known, is what the significance is of
`CborEncoderAppendType`.  It basically tells the writer the nature of
the data being written, but the default implementation ignores this and
just blindly appends it no matter what.

That raises the question of why it's important enough that the writer
function needs to know about it.
This reads a CBOR file piece-wise, seeking backward and forward through
the file if needed.  Some seeking can be avoided by tuning the block
size used in reads so that the read window shifts by smaller amounts.
@sjlongland
Copy link
Author

This pull request is superseded by intel#208

@sjlongland sjlongland closed this Sep 4, 2021
thiagomacieira pushed a commit that referenced this pull request Mar 14, 2025
When function text_string_to_escaped successfully parses a string and
fails to parse the next value (cbor_value_finish_string_iteration
returns an error), it correctly propagates the error but the string is
never freed.

This can be reproduced with:

    make CC='clang -g -fsanitize=address'
    printf '\x82\x60\xff' | ./bin/cbordump -j

clang's Address Sanitizer reports:

    =================================================================
    ==20317==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 1 byte(s) in 1 object(s) allocated from:
        #0 0x560b654b9916 in __interceptor_realloc (/tinycbor/bin/cbordump+0xa4916) (BuildId: f9933666b5d987b21f68c2887de4aebe93bc2bef)
        #1 0x560b654f5c18 in escape_text_string /tinycbor/src/cbortojson.c:331:15
        #2 0x560b654f3e29 in text_string_to_escaped /tinycbor/src/cbortojson.c:377:19
        intel#3 0x560b654f267d in value_to_json /tinycbor/src/cbortojson.c:674:19
        intel#4 0x560b654f34c2 in array_to_json /tinycbor/src/cbortojson.c:545:25
        intel#5 0x560b654f2085 in value_to_json /tinycbor/src/cbortojson.c:627:19
        intel#6 0x560b654f1baf in cbor_value_to_json_advance /tinycbor/src/cbortojson.c:816:12
        intel#7 0x560b654ea928 in dumpFile /tinycbor/tools/cbordump/cbordump.c:76:19
        intel#8 0x560b654ead2b in main /tinycbor/tools/cbordump/cbordump.c:149:9
        intel#9 0x7fa9d7629d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

    SUMMARY: AddressSanitizer: 1 byte(s) leaked in 1 allocation(s).

Fix this by freeing the string when cbor_value_finish_string_iteration
fails.

Fixes: e072bc1 ("CBOR-to-JSON: do properly escape JSON strings")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants