-
-
Notifications
You must be signed in to change notification settings - Fork 12
Add separate method for encoding to Vec<u8>
.
#84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -117,6 +124,13 @@ impl PrimitiveContent for u8 { | |||
target.write_all(&[*self])?; | |||
Ok(()) | |||
} | |||
|
|||
fn append_encoded(&self, _: Mode, target: &mut Vec<u8>) { | |||
if *self > 0x7F { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many places, you're using an explicit value & 0x80 == 0
bit test. Is this condition doing the same thing, and if so, should it use the same bit-test style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does indeed check whether the most significant bit is set.
(Background: in BER, integers are always encoded as variable length signed integers which means that for unsigned integer that have the most significant bit set you need to add a zero.)
target.push(0); | ||
} | ||
else { | ||
let mut val = self.swap_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to convert into a big-endian representation? In that case, I'd strongly recommend using .to_be()
-- swap_bytes()
would convert into little-endian on big-endian architectures. In fact, since you're working on each byte separately, .to_be_bytes()
would be even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually flips the bytes on purpose. Because of the variable length-yness, I have to work my way down from the most significant byte and skip them while they are zero. By flipping, I can keep looking at the least significant byte and right-shift the next byte into position.
if val & 0x80 != 0 { | ||
target.push(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the encoding format, but I just want to check if I understand this bit of code correctly. In this encoding, the first byte of any encoded value has its most significant bit unset, right? Otherwise, I think this encoding would be ambiguous (e.g. if a decoder is expected an encoded integer and sees a zero byte, how does it know whether to stop?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it looks like signed integers can be encoded as a single 0xFF
, so I have ono idea how this encoding works. It looks ambiguous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integers are encoded as a variable length two’s complement with the length known up front. They are always signed which means the very first bit is the sign and needs to be cleared for values greater than zero. The length has to be full bytes, so if the leftmost bit with a value of one is at a multiple of eight, a zero bytes needs to be added to its left.
if len < 0x80 { | ||
target.push(len as u8); | ||
} | ||
else if len < 0x1_00 { | ||
target.push(0x81); | ||
target.push(len as u8); | ||
} | ||
else if len < 0x1_0000 { | ||
target.push(0x82); | ||
target.push((len >> 8) as u8); | ||
target.push(len as u8); | ||
} | ||
else if len < 0x100_0000 { | ||
target.push(0x83); | ||
target.push((len >> 16) as u8); | ||
target.push((len >> 8) as u8); | ||
target.push(len as u8); | ||
} | ||
else if len < 0x1_0000_0000 { | ||
target.push(0x84); | ||
target.push((len >> 24) as u8); | ||
target.push((len >> 16) as u8); | ||
target.push((len >> 8) as u8); | ||
target.push(len as u8); | ||
} | ||
else { | ||
panic!("excessive length") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's an easier way to encode this using .to_be_bytes()
. You can even do something like
match len.to_be_bytes() {
[0, 0, 0, a @ 0..=0x7F] => target.push(a),
[0, 0, 0, a] => target.extend_from_slice(&[0x81, a]),
[0, 0, a, b] => target.extend_from_slice(&[0x82, a, b]),
[0, a, b, c] => target.extend_from_slice(&[0x83, a, b, c]),
[a, b, c, d] => target.extend_from_slice(&[0x84, a, b, c, d]),
}
or perhaps just skip leading zeros and check the length of the sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to rewrite this whole thing in such a way that I can re-use the logic for generating those bytes for both the write_encoded
and append_encoded
methods. So, follow up PR.
} | ||
|
||
/// Writes the encoded value to a target. | ||
#[cfg(not(target_pointer_width = "64"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This encoding acts differently depending upon the native machine architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It caps the allowed length at 2^32-1 for non-64 bit architectures. I am going to rewrite this to use a usize
and generate the correct encoded array on the fly.
This PR has been superseded by #86. |
This PR adds a new methods
Values::append_encoded
andPrimitiveContent::append_encoded
that append the content to aVec<u8>
without returning aResult<(), io::Error>
. This results in a bit of code duplication but avoids the use ofunwrap
s in case where a value is assembled in memory which is the most common use case.This PR simply copies how
Tag
andLength
are being encoded. They will be re-written in a follow-up PR.