Skip to content

feat(codecs): add varint length delimited framing for protobuf #23352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

modev2301
Copy link

@modev2301 modev2301 commented Jul 10, 2025

This commit adds support for varint length-delimited framing for protobuf sources and sinks in Vector. This addresses the use case where tools like ClickHouse expect protobuf messages with varint length prefixes instead of the standard 32-bit length prefixes.

Changes

  • Add VarintLengthDelimitedEncoder for encoding varint length prefixes
  • Add VarintLengthDelimited option to FramingConfig enums
  • Update default protobuf framing to use varint instead of 32-bit length
  • Add comprehensive tests for varint framing (7 tests, all passing)
  • Update validation resources to handle new framing option

Benefits

  • Better compatibility with tools like ClickHouse
  • Eliminates risk of protobuf messages being cut or skipped
  • Properly handles zero-length messages
  • Backward compatible with existing configurations

Usage

# Sources
sources:
  protobuf_source:
    type: socket
    decoding:
      codec: protobuf
      protobuf:
        desc_file: "path/to/protobuf.desc"
        message_type: "package.MessageType"
    framing:
      method: varint_length_delimited

# Sinks
sinks:
  protobuf_sink:
    type: socket
    encoding:
      codec: protobuf
      protobuf:
        desc_file: "path/to/protobuf.desc"
        message_type: "package.MessageType"
    framing:
      method: varint_length_delimited

Testing

  • All varint framing tests pass (7/7)
  • Vector compiles successfully
  • Configuration validation works
  • Default behavior updated correctly

Closes: [20156]

Summary

Vector configuration

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

This commit adds support for varint length-delimited framing for protobuf
sources and sinks in Vector. This addresses the use case where tools like
ClickHouse expect protobuf messages with varint length prefixes instead
of the standard 32-bit length prefixes.

## Changes

- Add VarintLengthDelimitedEncoder for encoding varint length prefixes
- Add VarintLengthDelimited option to FramingConfig enums
- Update default protobuf framing to use varint instead of 32-bit length
- Add comprehensive tests for varint framing (7 tests, all passing)
- Update validation resources to handle new framing option

## Benefits

- Better compatibility with tools like ClickHouse
- Eliminates risk of protobuf messages being cut or skipped
- Properly handles zero-length messages
- Backward compatible with existing configurations

## Usage

```yaml
# Sources
sources:
  protobuf_source:
    type: socket
    decoding:
      codec: protobuf
      protobuf:
        desc_file: "path/to/protobuf.desc"
        message_type: "package.MessageType"
    framing:
      method: varint_length_delimited

# Sinks
sinks:
  protobuf_sink:
    type: socket
    encoding:
      codec: protobuf
      protobuf:
        desc_file: "path/to/protobuf.desc"
        message_type: "package.MessageType"
    framing:
      method: varint_length_delimited
```

## Testing

- All varint framing tests pass (7/7)
- Vector compiles successfully
- Configuration validation works
- Default behavior updated correctly

Closes: [Issue number]
@modev2301 modev2301 requested review from a team as code owners July 10, 2025 01:38
@bits-bot
Copy link

bits-bot commented Jul 10, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ modev2301
❌ malwahei
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added the domain: codecs Anything related to Vector's codecs (encoding/decoding) label Jul 10, 2025
@pront pront self-assigned this Jul 10, 2025
@@ -0,0 +1,92 @@
# Example configuration demonstrating varint framing for protobuf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great but we need to find a better place for it in the docs.

return Ok(None);
}

let mut value: usize = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut value: usize = 0;
let mut value: u64 = 0;

}

let mut value: usize = 0;
let mut shift: u32 = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut shift: u32 = 0;
let mut shift: u8 = 0;


for byte in buf.iter() {
bytes_read += 1;
let byte_value = (*byte & 0x7F) as usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let byte_value = (*byte & 0x7F) as usize;
let byte_value = (*byte & 0x7F) as u64;

let mut input = BytesMut::from(&[0xAC, 0x02, b'f', b'o', b'o'][..]);
let mut decoder = VarintLengthDelimitedDecoder::default();

assert_eq!(decoder.decode(&mut input).unwrap().unwrap(), "foo");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_eq!(decoder.decode(&mut input).unwrap().unwrap(), "foo");
assert_eq!(decoder.decode(&mut input).unwrap().unwrap(), Bytes::from("foo"));

// Check if the length is reasonable
if length > self.max_frame_length {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider introducing a:

#[derive(Debug, Snafu)]
pub enum VarintFramingError {
    #[snafu(display("Varint too large"))]
    VarintOverflow,

    #[snafu(display("Frame too large: {length} bytes (max: {max})"))]
    FrameTooLarge { length: usize, max: usize },

    #[snafu(display("Trailing data at EOF"))]
    TrailingData,

    #[snafu(display("I/O error: {}", source))]
    Io { source: io::Error },
}

Also, do we need a custom can_continue?

impl StreamDecodingError for VarintFramingError {
    fn can_continue(&self) -> bool {
        // ?
    }
}

@pront
Copy link
Member

pront commented Jul 10, 2025

CLA assistant check Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.You have signed the CLA already but the status is still pending? Let us recheck it.

Thank you for this great PR! This CLA is mandatory before we can merge.

@pront pront force-pushed the master branch 4 times, most recently from 1720078 to ffe54be Compare July 10, 2025 15:43
@pront pront added the meta: awaiting author Pull requests that are awaiting their author. label Jul 10, 2025
@github-actions github-actions bot removed the meta: awaiting author Pull requests that are awaiting their author. label Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: codecs Anything related to Vector's codecs (encoding/decoding)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants