Add separate method for encoding to `Vec<u8>`. #84

partim · 2025-04-14T12:26:22Z

This PR adds a new methods Values::append_encoded and PrimitiveContent::append_encoded that append the content to a Vec<u8> without returning a Result<(), io::Error>. This results in a bit of code duplication but avoids the use of unwraps in case where a value is assembled in memory which is the most common use case.

This PR simply copies how Tag and Length are being encoded. They will be re-written in a follow-up PR.

bal-e · 2025-04-14T12:48:13Z

src/encode/primitive.rs

@@ -117,6 +124,13 @@ impl PrimitiveContent for u8 {
        target.write_all(&[*self])?;
        Ok(())
    }
+
+    fn append_encoded(&self, _: Mode, target: &mut Vec<u8>) {
+        if *self > 0x7F {


In many places, you're using an explicit value & 0x80 == 0 bit test. Is this condition doing the same thing, and if so, should it use the same bit-test style?

This does indeed check whether the most significant bit is set.

(Background: in BER, integers are always encoded as variable length signed integers which means that for unsigned integer that have the most significant bit set you need to add a zero.)

bal-e · 2025-04-14T12:50:10Z

src/encode/primitive.rs

+                    target.push(0);
+                }
+                else {
+                    let mut val = self.swap_bytes();


Is this meant to convert into a big-endian representation? In that case, I'd strongly recommend using .to_be() -- swap_bytes() would convert into little-endian on big-endian architectures. In fact, since you're working on each byte separately, .to_be_bytes() would be even better.

This actually flips the bytes on purpose. Because of the variable length-yness, I have to work my way down from the most significant byte and skip them while they are zero. By flipping, I can keep looking at the least significant byte and right-shift the next byte into position.

bal-e · 2025-04-14T12:57:24Z

src/encode/primitive.rs

+                    if val & 0x80 != 0 {
+                        target.push(0);
+                    }


I'm not familiar with the encoding format, but I just want to check if I understand this bit of code correctly. In this encoding, the first byte of any encoded value has its most significant bit unset, right? Otherwise, I think this encoding would be ambiguous (e.g. if a decoder is expected an encoded integer and sees a zero byte, how does it know whether to stop?).

No, it looks like signed integers can be encoded as a single 0xFF, so I have ono idea how this encoding works. It looks ambiguous to me.

Integers are encoded as a variable length two’s complement with the length known up front. They are always signed which means the very first bit is the sign and needs to be cleared for values greater than zero. The length has to be full bytes, so if the leftmost bit with a value of one is at a multiple of eight, a zero bytes needs to be added to its left.

bal-e · 2025-04-14T13:06:12Z

src/length.rs

+                if len < 0x80 {
+                    target.push(len as u8);
+                }
+                else if len < 0x1_00 {
+                    target.push(0x81);
+                    target.push(len as u8);
+                }
+                else if len < 0x1_0000 {
+                    target.push(0x82);
+                    target.push((len >> 8) as u8);
+                    target.push(len as u8);
+                }
+                else if len < 0x100_0000 {
+                    target.push(0x83);
+                    target.push((len >> 16) as u8);
+                    target.push((len >> 8) as u8);
+                    target.push(len as u8);
+                }
+                else if len < 0x1_0000_0000 {
+                    target.push(0x84);
+                    target.push((len >> 24) as u8);
+                    target.push((len >> 16) as u8);
+                    target.push((len >> 8) as u8);
+                    target.push(len as u8);
+                }
+                else {
+                    panic!("excessive length")
+                }


I'm sure there's an easier way to encode this using .to_be_bytes(). You can even do something like

match len.to_be_bytes() { [0, 0, 0, a @ 0..=0x7F] => target.push(a), [0, 0, 0, a] => target.extend_from_slice(&[0x81, a]), [0, 0, a, b] => target.extend_from_slice(&[0x82, a, b]), [0, a, b, c] => target.extend_from_slice(&[0x83, a, b, c]), [a, b, c, d] => target.extend_from_slice(&[0x84, a, b, c, d]), }

or perhaps just skip leading zeros and check the length of the sequence.

I want to rewrite this whole thing in such a way that I can re-use the logic for generating those bytes for both the write_encoded and append_encoded methods. So, follow up PR.

bal-e · 2025-04-14T13:06:54Z

src/length.rs

+    }
+
+    /// Writes the encoded value to a target.
+    #[cfg(not(target_pointer_width = "64"))]


This encoding acts differently depending upon the native machine architecture?

It caps the allowed length at 2^32-1 for non-64 bit architectures. I am going to rewrite this to use a usize and generate the correct encoded array on the fly.

partim · 2025-05-22T09:43:56Z

This PR has been superseded by #86.

Add append_encoded to encoder traits.

ce55cac

partim marked this pull request as draft April 14, 2025 12:27

partim marked this pull request as ready for review April 14, 2025 12:32

partim requested a review from bal-e April 14, 2025 12:32

bal-e reviewed Apr 14, 2025

View reviewed changes

partim closed this May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add separate method for encoding to `Vec<u8>`. #84

Add separate method for encoding to `Vec<u8>`. #84

Uh oh!

partim commented Apr 14, 2025 •

edited

Loading

Uh oh!

bal-e Apr 14, 2025

Uh oh!

partim Apr 14, 2025

Uh oh!

bal-e Apr 14, 2025

Uh oh!

partim Apr 14, 2025

Uh oh!

bal-e Apr 14, 2025

Uh oh!

bal-e Apr 14, 2025

Uh oh!

partim Apr 14, 2025

Uh oh!

bal-e Apr 14, 2025

Uh oh!

partim Apr 14, 2025

Uh oh!

bal-e Apr 14, 2025

Uh oh!

partim Apr 14, 2025

Uh oh!

partim commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Add separate method for encoding to Vec<u8>. #84

Add separate method for encoding to Vec<u8>. #84

Uh oh!

Conversation

partim commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

partim commented May 22, 2025

Uh oh!

Uh oh!

Add separate method for encoding to `Vec<u8>`. #84

Add separate method for encoding to `Vec<u8>`. #84

partim commented Apr 14, 2025 •

edited

Loading