Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use big int for integer type without max value #65

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

SiebrenW
Copy link
Contributor

This addresses the issue #54
The exi_unsigned_t type is being reused (and its array size is increased to 20) to make an exi_signed_t that's effectively a big int in this case.
For diagnostics we may want to make a tostring-like function, but I haven't really thought about how to do that yet (probably just printing each byte in hex)
This does expose the exi_unsigned_t and exi_signed_t in the other encode/decode source/header files.

I generated din, iso2 and iso2 and all of it seems to build. I just haven't tested the encoded stream or tried to decode a known stream yet. I will test it if the code is fine-ish. If you want more fundamental changes, then I don't think that makes sense to create a test suite for that.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from ca8ab5f to 286638b Compare January 17, 2024 12:40
@SiebrenW
Copy link
Contributor Author

I tested it with the following code: https://gist.github.com/SirTates/e840a6cbc3831b3b8134d7d4098d2682 and it worked with my last commit for the decode (I already figured this was missing the non-strict SE, but I copy pasted the template from another jinja that was supposed to be trivial too)

I must comment on the usability of the library compared to OpenV2G: I sorely miss the flush function (as seen with my hacky if bit_count != 0. I know there are bugs in there), which is not ideal. I'm not too fond of the exi_bitstream_get_length, and I only figured out its existence after the tests.

Regarding the performance I'm thinking of rewriting the ..write_bits function to write entire bytes if the nbits exceeds 8 instead of bit for bit. The compiler does not optimise this, so the code needs to be optimised to be less wasteful.

@TGruett
Copy link

TGruett commented Jan 22, 2024

Hi,
Have you also tested it with larger SerialNumbers? I have just tried it with a 39-digit number (16 octets) and it comes out differently than I expected.
To be more precise: I converted a CertificateInstallationReq with a 39-digit SerialNumber to EXI with another encoder and decoded it with cbexigen. The value that comes out is different from the one I put in. I've encoded 16 octets but get 19 octets after decoding.

However, the implementation worked for me with smaller numbers

@SiebrenW
Copy link
Contributor Author

Hi, Have you also tested it with larger SerialNumbers? I have just tried it with a 39-digit number (16 octets) and it comes out differently than I expected. To be more precise: I converted a CertificateInstallationReq with a 39-digit SerialNumber to EXI with another encoder and decoded it with cbexigen. The value that comes out is different from the one I put in. I've encoded 16 octets but get 19 octets after decoding.

However, the implementation worked for me with smaller numbers

It's whacking the original number in 7 bit chunks with 1 bit padding every byte. This can be written to/read directly from the EXI stream. Up to an int64 you can use the exi_basetypes_convert_64_to_signed and exi_basetypes_convert_64_from_signed, but that's as far as C native types go. Anything under 7 bits will fit just fine in the first octet.

That does remind me that 20 bytes may not be sufficient, because the padding is basically using 2.5 bytes, we should have 23 bytes to fill 20 bytes' worth of big int.

@TGruett
Copy link

TGruett commented Jan 24, 2024

Ah OK, I see. Would it be possible for you to create a function to build a 'real' 8 byte (uint8) array from the exi_signed / exi_unsigned?
I want to use the decoder to display the X509SerialNumber and this would help me to print the real-world example with 16 octets at least as a hex string.

@TGruett
Copy link

TGruett commented Jan 25, 2024

I came across a post on Stack Overflow that appears to provide a solution to our problem: https://stackoverflow.com/questions/32670626/remove-nth-bit-from-buffer-and-shift-the-rest.

I tried to create the unpadded uint8 array using the function mentioned in the post, but it did not work correctly. It is possible that I only made a mistake with the LSB and MSB or something. I am not very familiar with the EXI encoding.

Maybe the mentioned post could help you with this issue. Otherwise I'll try to get back to it next week, if I have the time

@SiebrenW
Copy link
Contributor Author

SiebrenW commented Jan 25, 2024

I tested it for a bit and I came to this solution:

int exi_basetypes_convert_bytes_from_unsigned(const exi_unsigned_t* exi_unsigned, uint8_t* data, size_t* data_len, size_t data_size)
{    
    const uint8_t* current_octet = exi_unsigned->octets;
    uint16_t temp = 0;
    *data_len = 0;
    size_t total_offset = 0;

    for (size_t n = 0; n < exi_unsigned->octets_count; n++) {
        temp = temp + ((uint16_t)(*current_octet & EXI_BASETYPES_OCTET_SEQ_VALUE_MASK) << (total_offset));
        total_offset += 7;
        if (total_offset >= 8) {
            if (data_size == *data_len) {
                return -1;
            }
            total_offset -= 8;
            data[(*data_len)++] = temp & 0xFF;
            temp >>= 8;
        }
        current_octet++;
    }
    if (total_offset != 0) {
        if (data_size == *data_len) {
            return -1;
        }
        data[(*data_len)++] = temp & 0xFF;
    }
    return 0;
}

You can try to use that for now. It does happen to reverse the order of the bytes if you don't mind, but I only had a 30 minute break.
I may add this to the PR if this is somehow essential, but currently I'm wondering if it is. This is just for stringifying it, but the serialnumber is for identification and the 7-bit octets do that just fine.
Edit: I remembered that I didn't take the data_size into account. Added the check.

@TGruett
Copy link

TGruett commented Jan 30, 2024

Thank you very much, with this snippet I got everything to work as expected!

In my opinion it makes sense to add this code to PR. After all, this is an EXI de-/encoder and every user will expect to be able to de-/encode the EXI datatypes. Since the function already exists anyway, it is not much effort to add it.

@SiebrenW
Copy link
Contributor Author

Thank you very much, with this snippet I got everything to work as expected!

In my opinion it makes sense to add this code to PR. After all, this is an EXI de-/encoder and every user will expect to be able to de-/encode the EXI datatypes. Since the function already exists anyway, it is not much effort to add it.

I would first like to write a to_unsigned before I include that, if I can find the time to make it, so users can also import larger numbers like that. Otherwise we can only test hardcoded examples and can't even terrify the EVs that most likely won't support anything over 64 bits (if that) anyways.

I thought about actually making a decimal string (basically so it's closer to XML), but briefly thinking about the logic needed I figured that would take too long. Maybe I will at some point, seems like a fun challenge.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from a507b6d to d5ddbc4 Compare January 31, 2024 09:15
@SiebrenW
Copy link
Contributor Author

I added the bytes to and from unsigned procedures now. do notice that I get an extra byte when converting back to bytes, but for printing that's probably fine and I have spent too much of my break on this already 😐 .

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch 2 times, most recently from 03d046a to 272dee6 Compare January 31, 2024 13:34
@barsnick
Copy link
Contributor

barsnick commented Feb 2, 2024

We will take a look at this solution, and merge it, if it works for "everyone".

I haven't looked at the final details of the recent force-push, but obviously prefer an API representation as a (host-endian?) byte array representation of the large integer.

Regarding extra bytes, I will write a note in your other pull request. I think there may be misunderstandings regarding the counting of stream bits and bytes in cbV2G/cbExiGen.

@TGruett
Copy link

TGruett commented Feb 13, 2024

I've been indisposed for the last few days and have finally got the chance to test it again. Everything works fine for me. I only get an uninitialised warning for the uint16_t dummy in the function exi_basetypes_convert_bytes_to_unsigned()

@barsnick
Copy link
Contributor

finally got the chance to test it again. Everything works fine for me.

Thanks for looking at it. We will review it and eventually merge it.

I only get an uninitialised warning for the uint16_t dummy in the function exi_basetypes_convert_bytes_to_unsigned()

That needs to be fixed.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from 272dee6 to 29ad531 Compare February 13, 2024 14:11
@SiebrenW
Copy link
Contributor Author

I default initialised the dummy. I should add more warnings+werror in my test build script.

@TGruett
Copy link

TGruett commented Apr 8, 2024

Good Morning, what is the status of this PR? I'm just wondering if it's still planned to be merged. I am already using this big int implementation and it is working perfectly so far.

@SebaLukas
Copy link
Collaborator

This PR also helped us a lot today at the Testival in France.
But I still had to set EXI_BASETYPES_MAX_OCTETS_SUPPORTED to 30. The SerialNumbers of Hubject certificates are simply gigantic.

That's why I think the PR is great :) and would merge it directly if I don't find anything next week.

@SebaLukas SebaLukas assigned SebaLukas and unassigned barsnick and chausGit Oct 10, 2024
Copy link
Collaborator

@SebaLukas SebaLukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much again for your PR. I only found a few small things.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from 29ad531 to c0f4721 Compare October 11, 2024 09:00
Siebren Weertman added 3 commits October 11, 2024 11:01
Signed-off-by: Siebren Weertman <[email protected]>
Signed-off-by: Siebren Weertman <[email protected]>
Signed-off-by: Siebren Weertman <[email protected]>
Signed-off-by: Siebren Weertman <[email protected]>
Signed-off-by: Siebren Weertman <[email protected]>

const correctness and remove log

Signed-off-by: Siebren Weertman <[email protected]>

init dummy

Signed-off-by: Siebren Weertman <[email protected]>
@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch 2 times, most recently from be8d1f7 to 254153e Compare October 11, 2024 09:07
@SebaLukas
Copy link
Collaborator

Your last commit has no signed-off-by. Thats the reason why DCO fails.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from 254153e to c779a3e Compare October 11, 2024 09:38
Copy link
Contributor

@barsnick barsnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way the code was expanded to generic integers. Thank you for the effort.

I inserted a few style remarks.

Regression testing was fine. I didn't try any EXI streams with large X509Serials yet.

But one thing I am wondering: exi_basetypes_convert_bytes_{from,to}_unsigned() seems to insert/remove the EXI 7-bit integer encoding, which I was hoping you would add.

Yet this operation is not automatically applied to the result placed into the exi_unsigned_t. IMHO, an API which requires you to shuffle a large integer "word" memory block into 7-bit slots (or vice-versa) seems unacceptable to me, it does not meet the rule of least astonishment. 😉

Can we put these calls into src/input/code_templates/c/decoder/{En,De}codeTypeSigned.jinja?

@SebaLukas
Copy link
Collaborator

@SiebrenW Would you still work on @barsnick points? I would like to merge this PR soon :)

@SiebrenW
Copy link
Contributor Author

@SiebrenW Would you still work on @barsnick points? I would like to merge this PR soon :)

I have been busy this week, sorry, but I can have a go this evening.

@SiebrenW SiebrenW force-pushed the feature/use_big_int_for_integer branch from c779a3e to de69116 Compare October 18, 2024 19:28
@SiebrenW
Copy link
Contributor Author

I have applied most feedback. Would you check again?

regarding the copy or const pointer to the struct, I naively want to save some stack memory but I didn't check the instructions or anything. Just your typical "how much bigger than a pointer?" fingerspitzengefühl.

@barsnick
Copy link
Contributor

The integration of my review comments looks fine so far.

regarding the copy or const pointer to the struct, I naively want to save some stack memory but I didn't check the instructions or anything

That's probably fine as it is.

Would you accept a proposal where I intergrate the 7/8-8/7 stuffing and destuffing into the API, to make it a clear "integer in little-endian octet order" array without any stuffing? I have something prepared (for Monday).

@barsnick
Copy link
Contributor

barsnick commented Oct 21, 2024

You can try to use that for now. It does happen to reverse the order of the bytes if you don't mind, but I only had a 30 minute break.

That's fine. The "reversal" just means it's little-endian, which is okay if it's documented.

I may add this to the PR if this is somehow essential, but currently I'm wondering if it is. This is just for stringifying it, but the serialnumber is for identification and the 7-bit octets do that just fine.

I don't agree. If you need to encode an existing value, you want to put in the right one. On the other hand, perhaps the value isn't really used for anything. I still would prefer to have a straight-forward type in the API, not some obscure reflection of EXI internals.

By the way, by design, the 7-to-8 decoder sometimes adds an extra 0x00 byte at the end, which is insignificant since it's MSB. We can leave it right now, but a nice optimization would be to drop it (unless it's the only value in the byte array).

Let me try to comment my proposed changes in the PR. You can also look at https://github.com/EVerest/cbexigen/commits/feature/use_big_int_for_integer_fixup/, where I modeled them.

Copy link
Contributor

@barsnick barsnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put the remaining changes into a separate pull request.

@SebaLukas SebaLukas merged commit d0b5310 into EVerest:main Oct 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants