Skip to content

Add separate method for encoding to Vec<u8>. #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

partim
Copy link
Member

@partim partim commented Apr 14, 2025

This PR adds a new methods Values::append_encoded and PrimitiveContent::append_encoded that append the content to a Vec<u8> without returning a Result<(), io::Error>. This results in a bit of code duplication but avoids the use of unwraps in case where a value is assembled in memory which is the most common use case.

This PR simply copies how Tag and Length are being encoded. They will be re-written in a follow-up PR.

@partim partim marked this pull request as draft April 14, 2025 12:27
@partim partim marked this pull request as ready for review April 14, 2025 12:32
@partim partim requested a review from bal-e April 14, 2025 12:32
@@ -117,6 +124,13 @@ impl PrimitiveContent for u8 {
target.write_all(&[*self])?;
Ok(())
}

fn append_encoded(&self, _: Mode, target: &mut Vec<u8>) {
if *self > 0x7F {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many places, you're using an explicit value & 0x80 == 0 bit test. Is this condition doing the same thing, and if so, should it use the same bit-test style?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does indeed check whether the most significant bit is set.

(Background: in BER, integers are always encoded as variable length signed integers which means that for unsigned integer that have the most significant bit set you need to add a zero.)

target.push(0);
}
else {
let mut val = self.swap_bytes();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to convert into a big-endian representation? In that case, I'd strongly recommend using .to_be() -- swap_bytes() would convert into little-endian on big-endian architectures. In fact, since you're working on each byte separately, .to_be_bytes() would be even better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually flips the bytes on purpose. Because of the variable length-yness, I have to work my way down from the most significant byte and skip them while they are zero. By flipping, I can keep looking at the least significant byte and right-shift the next byte into position.

Comment on lines +200 to +202
if val & 0x80 != 0 {
target.push(0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the encoding format, but I just want to check if I understand this bit of code correctly. In this encoding, the first byte of any encoded value has its most significant bit unset, right? Otherwise, I think this encoding would be ambiguous (e.g. if a decoder is expected an encoded integer and sees a zero byte, how does it know whether to stop?).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it looks like signed integers can be encoded as a single 0xFF, so I have ono idea how this encoding works. It looks ambiguous to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integers are encoded as a variable length two’s complement with the length known up front. They are always signed which means the very first bit is the sign and needs to be cleared for values greater than zero. The length has to be full bytes, so if the leftmost bit with a value of one is at a multiple of eight, a zero bytes needs to be added to its left.

Comment on lines +257 to +284
if len < 0x80 {
target.push(len as u8);
}
else if len < 0x1_00 {
target.push(0x81);
target.push(len as u8);
}
else if len < 0x1_0000 {
target.push(0x82);
target.push((len >> 8) as u8);
target.push(len as u8);
}
else if len < 0x100_0000 {
target.push(0x83);
target.push((len >> 16) as u8);
target.push((len >> 8) as u8);
target.push(len as u8);
}
else if len < 0x1_0000_0000 {
target.push(0x84);
target.push((len >> 24) as u8);
target.push((len >> 16) as u8);
target.push((len >> 8) as u8);
target.push(len as u8);
}
else {
panic!("excessive length")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's an easier way to encode this using .to_be_bytes(). You can even do something like

match len.to_be_bytes() {
    [0, 0, 0, a @ 0..=0x7F] => target.push(a),
    [0, 0, 0, a] => target.extend_from_slice(&[0x81, a]),
    [0, 0, a, b] => target.extend_from_slice(&[0x82, a, b]),
    [0, a, b, c] => target.extend_from_slice(&[0x83, a, b, c]),
    [a, b, c, d] => target.extend_from_slice(&[0x84, a, b, c, d]),
}

or perhaps just skip leading zeros and check the length of the sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to rewrite this whole thing in such a way that I can re-use the logic for generating those bytes for both the write_encoded and append_encoded methods. So, follow up PR.

}

/// Writes the encoded value to a target.
#[cfg(not(target_pointer_width = "64"))]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This encoding acts differently depending upon the native machine architecture?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It caps the allowed length at 2^32-1 for non-64 bit architectures. I am going to rewrite this to use a usize and generate the correct encoded array on the fly.

@partim
Copy link
Member Author

partim commented May 22, 2025

This PR has been superseded by #86.

@partim partim closed this May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants