Document and improve abstract reader/writer interface #2

sjlongland · 2021-07-05T04:03:13Z

tinycbor fork branch thiagomacieira/dev adds support for abstract readers and writers, but the implementation had some limitations in cases where multiple CborValue cursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.

This documents the new interface and further builds upon it by slightly re-arranging CborParser members and opening the token in CborValue to use by the reader interface for any required purpose.

This method allows for writing raw data directly to the encoding buffer. This can be useful if you have something stored as CBOR encoded data. Fixes intel#162. Signed-off-by: Tofik Sonono <[email protected]>

Signed-off-by: phirsov <[email protected]>

printf s.b. used for indentation because no newline at end is needed Signed-off-by: phirsov <[email protected]>

bytestring got escaped hex string literal output format to not to confuse e.g. "\x11" bytestring with 11 integer Signed-off-by: phirsov <[email protected]>

string got plain old C output format to avoid confusion e.g. "true" string with CBOR true value Signed-off-by: phirsov <[email protected]>

Signed-off-by: Mahavir Jain <[email protected]>

…mes were unclear. containerEncoder could be the containing Encoder

… not always supported

Signed-off-by: Thiago Macieira <[email protected]>

…nt data as single float Motivation: half-precision floating point format is used to minimize storage and traffic mostly. Application level manipulates with single and double precision usually. So, two routines added to public API to encode/decode given single precision value in the half precision format Signed-off-by: S.Phirsov Signed-off-by: Thiago Macieira <[email protected]>

The listing was outdated. Signed-off-by: Thiago Macieira <[email protected]>

The last version of Qt to support MSVC 2015 is no longer maintained, so I'm skipping that. I'm therefore rededicating MSVC 2017 for 32-bit. There's also no more no-tests build.

We don't need to compare the map lengths directly. The memcmp is sufficient, since the source data is big endian. Of course, verifying for sorting requires the map has known length. Signed-off-by: Thiago Macieira <[email protected]>

Because of all the inline functions, #include'ing <cbor.h> without linking in all .c files may result in undefined symbol linker errors. Signed-off-by: Thiago Macieira <[email protected]>

QCOMPARE macro has a return in case of failure. If a test fails inside the encodeOne function, we log that error, but were proceeding to perform more tests (which could fail again and produce more errors). Signed-off-by: Thiago Macieira <[email protected]>

As noted in the comment, we need to be sure we don't allow a length too big from the stream to overflow and become smaller than the number of bytes we're looking for. This is also the first step in creating an API that reads from something other than a linear buffer. Signed-off-by: Thiago Macieira <[email protected]>

This commit does not change all the readers yet, this is just the first step. As an interesting side-effect, we ended up reading the half-float into it->extra and need not re-read it. The cbor_value_get_half_float() function can be inlined in a later commit. Signed-off-by: Thiago Macieira <[email protected]>

We don't need to skimp on bits in the CborValue::flags, so save the fact that the preparser found a 64-bit number in there. This saves us from having to re-read the descriptor byte again in _cbor_value_decode_int64_internal(). Signed-off-by: Thiago Macieira <[email protected]>

It was originally non-inline because I had thought of doing conversions from half-float to float and onwards to double, but I never actually made that in the API. Instead, even the get_float() and get_double(), we only memcpy anyway and leave it up to the upper layer to convert, as needed. This change triggered an use-when-uninitialised false positive warning that I needed to work around in the validator. Signed-off-by: Thiago Macieira <[email protected]>

The extract_length() function is only used in string context, so rename accordingly. Signed-off-by: Thiago Macieira <[email protected]>

The extract_length() was the only case that called extract_number_and_advance() without verifying we had a proper number. This commit reworks the implementation so extract_number_and_advance() reuses the number previously read by preparse_value(), and extract_length() gets inlined to the only place that uses it. Signed-off-by: Thiago Macieira <[email protected]>

Signed-off-by: Thiago Macieira <[email protected]>

Instead of consuming it at the end of the last element of a map or array of unknown length. This allows us to obtain the pointer to or offset of the Break byte. Signed-off-by: Thiago Macieira <[email protected]>

Signed-off-by: Thiago Macieira <[email protected]>

We need to re-parse if the input buffer was too short to read the current element's information. When that happens, the current element will be CborInvalidType, so we can't easily resume. Signed-off-by: Thiago Macieira <[email protected]>

Instead of just one function (_cbor_value_get_string_chunk), we now have _cbor_value_begin_string_iteration, _cbor_value_finish_string_iteration, _cbor_value_get_string_chunk_size, and _cbor_value_get_string_chunk. The "begin" function positions the pointer at the first chunk. That's what makes "get_size" possible, since it doesn't need to check for any state. The "finish" funcntion allows the caller to distinguish an error parsing the string from an error parsing the next value. Signed-off-by: Thiago Macieira <[email protected]>

Signed-off-by: Dmitry Shachnev <[email protected]>

Signed-off-by: Thiago Macieira <[email protected]>

Instead of just one function (_cbor_value_get_string_chunk), we now have _cbor_value_begin_string_iteration, _cbor_value_finish_string_iteration, _cbor_value_get_string_chunk_size, and _cbor_value_get_string_chunk. The "begin" function positions the pointer at the first chunk. That's what makes "get_size" possible, since it doesn't need to check for any state. The "finish" funcntion allows the caller to distinguish an error parsing the string from an error parsing the next value. Signed-off-by: Thiago Macieira <[email protected]>

`cbor_parser_init` and `cbor_parser_init_reader` are substantially similar, however the latter misses clearing out `it->flags`, leaving it uninitialised so possibly unsafe. Rather than copying & pasting that from `cbor_parser_init`, lets just use one routine that does the "common" part, then each routine can focus on the specifics needed.

Describe the input parameters for the function and how they are used as best we understand from on-paper analysis of the C code.

The `token` parameter is not sufficient since it is effectively shared by all `CborValue` instances. Since `tinycbor` often uses a temporary `CborValue` context to perform some operation, we need to store our context inside that `CborValue` so that we don't pollute the global state of the reader.

In its place, put an arbitrary `void *` pointer for reader context. The reader needs to store some context information which is specific to the `CborParser` instance it is serving. Right now, `CborValue::source::token` serves this purpose, but the problem is that we also need a per-`CborValue` context and have nowhere to put it. Better to spend an extra pointer (4 bytes on 32-bit platforms) in the `CborParser` (which there'll be just one of), then to do it in the `CborValue` (which there may be several of) or to use a `CborReader` object that itself carries two pointers (`ops` and the context, thus we'd need an extra 3 pointers).

We simplify this reader in two ways: 1. we remove the `consumed` member of `struct Input`, and instead use the `CborValue`'s `source.token` member, which we treat as an unsigned integer offset into our `QByteArray`. 2. we replace the reader-specific `struct Input` with the `QByteArray` it was wrapping, since that's the only thing now contained in our `struct Input`. If a `CborValue` gets cloned, the pointer referred to by `source.token` similarly gets cloned, thus when we advance the pointer on the clone, it leaves the original alone, so computing the length of unknown-length entities in the CBOR document can be done safely.

What is not known, is what the significance is of `CborEncoderAppendType`. It basically tells the writer the nature of the data being written, but the default implementation ignores this and just blindly appends it no matter what. That raises the question of why it's important enough that the writer function needs to know about it.

This reads a CBOR file piece-wise, seeking backward and forward through the file if needed. Some seeking can be avoided by tuning the block size used in reads so that the read window shifts by smaller amounts.

sjlongland · 2021-09-04T00:11:46Z

This pull request is superseded by intel#208

When function text_string_to_escaped successfully parses a string and fails to parse the next value (cbor_value_finish_string_iteration returns an error), it correctly propagates the error but the string is never freed. This can be reproduced with: make CC='clang -g -fsanitize=address' printf '\x82\x60\xff' | ./bin/cbordump -j clang's Address Sanitizer reports: ================================================================= ==20317==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1 byte(s) in 1 object(s) allocated from: #0 0x560b654b9916 in __interceptor_realloc (/tinycbor/bin/cbordump+0xa4916) (BuildId: f9933666b5d987b21f68c2887de4aebe93bc2bef) #1 0x560b654f5c18 in escape_text_string /tinycbor/src/cbortojson.c:331:15 #2 0x560b654f3e29 in text_string_to_escaped /tinycbor/src/cbortojson.c:377:19 intel#3 0x560b654f267d in value_to_json /tinycbor/src/cbortojson.c:674:19 intel#4 0x560b654f34c2 in array_to_json /tinycbor/src/cbortojson.c:545:25 intel#5 0x560b654f2085 in value_to_json /tinycbor/src/cbortojson.c:627:19 intel#6 0x560b654f1baf in cbor_value_to_json_advance /tinycbor/src/cbortojson.c:816:12 intel#7 0x560b654ea928 in dumpFile /tinycbor/tools/cbordump/cbordump.c:76:19 intel#8 0x560b654ead2b in main /tinycbor/tools/cbordump/cbordump.c:149:9 intel#9 0x7fa9d7629d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 SUMMARY: AddressSanitizer: 1 byte(s) leaked in 1 allocation(s). Fix this by freeing the string when cbor_value_finish_string_iteration fails. Fixes: e072bc1 ("CBOR-to-JSON: do properly escape JSON strings")

TSonono and others added 7 commits November 20, 2019 12:33

Added the method cbor_encode_raw to the API

0db7e22

This method allows for writing raw data directly to the encoding buffer. This can be useful if you have something stored as CBOR encoded data. Fixes intel#162. Signed-off-by: Tofik Sonono <[email protected]>

check app arguments in more strict manner

467b0eb

Signed-off-by: phirsov <[email protected]>

indentation typo fixed

722f649

printf s.b. used for indentation because no newline at end is needed Signed-off-by: phirsov <[email protected]>

bytestring output pretty-print improved

8acb09f

bytestring got escaped hex string literal output format to not to confuse e.g. "\x11" bytestring with 11 integer Signed-off-by: phirsov <[email protected]>

string pretty-print improved

1a43b45

string got plain old C output format to avoid confusion e.g. "true" string with CBOR true value Signed-off-by: phirsov <[email protected]>

only build docs on master branch

e35f736

Add checks for memory allocation failures

7c349db

Signed-off-by: Mahavir Jain <[email protected]>

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 5d24326 to 64299b9 Compare July 7, 2021 08:35

mcr and others added 22 commits September 3, 2021 11:27

change the argument name from "encoder" to "parentEncoder", as the na…

d106276

…mes were unclear. containerEncoder could be the containing Encoder

Merge commit 'refs/pull/177/head' of github.com:intel/tinycbor

5115a87

Merge commit 'refs/pull/197/head' of github.com:intel/tinycbor

3d8da2c

clarify that CborIndefiniteLength creates an indefinite map, which is…

4c7b15c

… not always supported

Docs: update to match the last commit for create_array() too

11590e4

Signed-off-by: Thiago Macieira <[email protected]>

Update references of 'master' to 'main'

dbf8f13

Update version number for TinyCBOR 0.6

cb37252

Signed-off-by: Thiago Macieira <[email protected]>

Update the qmake buildsystem source files

9ed9d03

The listing was outdated. Signed-off-by: Thiago Macieira <[email protected]>

Merge remote-tracking branch 'origin/main' into HEAD

5910b7d

Build system: Add a few -Werror for sane C development

e2a4ed1

AppVeyor: update to use more recent Qt and MSVC

bf919a2

The last version of Qt to support MSVC 2015 is no longer maintained, so I'm skipping that. I'm therefore rededicating MSVC 2017 for 32-bit. There's also no more no-tests build.

Add a way to disable the declaration of some API

6f001e6

Because of all the inline functions, #include'ing <cbor.h> without linking in all .c files may result in undefined symbol linker errors. Signed-off-by: Thiago Macieira <[email protected]>

Parser: use read_bytes() in the extract_number function

a8515c4

The extract_length() function is only used in string context, so rename accordingly. Signed-off-by: Thiago Macieira <[email protected]>

Validator & Pretty: Remove the last uses of _cbor_value_extract_number

5d871b2

Signed-off-by: Thiago Macieira <[email protected]>

thiagomacieira and others added 23 commits September 3, 2021 13:08

Pretty & Validation: remove the last direct accesses to CborValue::ptr

d503c11

Signed-off-by: Thiago Macieira <[email protected]>

Parser: let cbor_value_leave_container() consume the Break

95129b8

Instead of consuming it at the end of the last element of a map or array of unknown length. This allows us to obtain the pointer to or offset of the Break byte. Signed-off-by: Thiago Macieira <[email protected]>

WIP Initial API for delegated streaming in

34c8452

Signed-off-by: Thiago Macieira <[email protected]>

WIP Initial API for delegated streaming out

e26ff9a

Signed-off-by: Thiago Macieira <[email protected]>

Move the testdata out to a separate .cpp so they can be reused

87a7a93

Signed-off-by: Thiago Macieira <[email protected]>

Parser: fix reading it->extra on big endian when bytesNeeded == 1

d393c16

Signed-off-by: Dmitry Shachnev <[email protected]>

.gitignore: ignore the c90 test too

10f7399

Merge commit 'refs/pull/169/head' of github.com:intel/tinycbor into dev

c9ebbaa

Signed-off-by: Thiago Macieira <[email protected]>

Encoder: add unit test for cbor_encode_raw

cc7774e

Signed-off-by: Thiago Macieira <[email protected]>

WSHUB-458: cborparser: Document cbor_parser_init_reader.

069240c

Describe the input parameters for the function and how they are used as best we understand from on-paper analysis of the C code.

WSHUB-458: cbor: Document the reader interface.

6679e85

WSHUB-458: cborparser: Move the reader context to CborParser.

8d699d3

WSHUB-458: cborparser: Update documentation

d352a9a

WSHUB-458: examples: Add buffered writer example.

767108a

WSHUB-458: examples: Add buffered reader example

027704f

This reads a CBOR file piece-wise, seeking backward and forward through the file if needed. Some seeking can be avoided by tuning the block size used in reads so that the read window shifts by smaller amounts.

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 64299b9 to 027704f Compare September 4, 2021 00:08

sjlongland mentioned this pull request Sep 4, 2021

Document and improve abstract reader/writer interface intel/tinycbor#208

Open

sjlongland closed this Sep 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document and improve abstract reader/writer interface #2

Document and improve abstract reader/writer interface #2

Uh oh!

sjlongland commented Jul 5, 2021

Uh oh!

sjlongland commented Sep 4, 2021

Uh oh!

Uh oh!

Document and improve abstract reader/writer interface #2

Document and improve abstract reader/writer interface #2

Uh oh!

Conversation

sjlongland commented Jul 5, 2021

Uh oh!

sjlongland commented Sep 4, 2021

Uh oh!

Uh oh!