-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go: add support for stream handling of attachments #737
Conversation
The point raised in the commit message about ContentLength relates to this issue: #736 Before merging this patch we should pick what name we want to use for that field, if people are on board with the idea of splitting Data into ContentLength and Data. I don't feel too strongly about whether that spec change needs to go in before or after this merges, but before is probably slightly preferable. Also whatever approach we go with for stream handling (a callback mechanism in this patch) should probably land in the other SDKs at some point, so we should make sure we are all comfortable with the approach. |
go/mcap/parse.go
Outdated
logTime, offset, err := getUint64(buf, 0) | ||
func parseAttachmentReader(r io.Reader) (*AttachmentReader, error) { | ||
buf := make([]byte, 8) | ||
crcReader := newCRCReader(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make CRC validation optional somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I can add the option to skip the computation of CRC. Actual validation of the CRC is up to the callback function the user supplies.
@@ -451,19 +473,24 @@ type LexerOptions struct { | |||
// MaxRecordSize defines the maximum size record the lexer will read. | |||
// Records larger than this will result in an error. | |||
MaxRecordSize int | |||
|
|||
// AttachmentCallback | |||
AttachmentCallback func(*AttachmentReader) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand comment
}) | ||
|
||
for !errors.Is(err, io.EOF) { | ||
_, _, err = lexer.Next(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make some assertion here for clarity, should be nil or io.EOF
go/mcap/mcap.go
Outdated
Name string | ||
MediaType string | ||
ContentLength uint64 | ||
data *io.LimitedReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group this with the other private fields
go/mcap/testutils.go
Outdated
8 + // create time | ||
4 + // attachment name length | ||
4 + // media type length | ||
4 + // data size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this should be 8. why doesn't this cause a parse failure? I guess because the test isn't doing anything with the CRC and is interpreting the CRC bytes as part of the data size. Since it's all zeros due to the empty content that would erroneously pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's going in a good direction - once it's possible to skip CRC calculation when reading attachments, I'm happy for this to merge.
Adds support for streaming reads and writes of attachment data. With this change, ParseAttachment is dropped as a public function and replaced with a private function, parseAttachmentReader. A new type AttachmentReader is exposed, which exposes the Data portion of an attachment as a reader that can be streamed. The Attachment type is modified to include a new ContentLength field, to allow the writer to write the attachment length before stream processing the body. A new option, AttachmentCallback, is exposed in the Lexer. Rather than emitting attachments, the Lexer now parses an attachment reader and calls the supplied callback (if supplied), and consumes any unconsumed bytes from the attachment record (whether or not a callback is supplied). This is a breaking change in the Go SDK, and introduces some awkwardness due to the presence of the new ContentLength field in the Attachment record. In the specification, that value is folded into the Data field, however this is not a tenable approach for stream handling. The actual binary format is not affected by this - only the way in which the binary format is described. This change is required for readers and writers to be able to handle attachments bigger than available RAM. Without it, attaching a sufficiently large attachment will crash the reader/writer with an OOM.
3fc6c52
to
d6f8b05
Compare
Adds support for streaming reads and writes of attachment data. With this change, ParseAttachment is dropped as a public function and replaced with a private function, parseAttachmentReader.
A new type AttachmentReader is exposed, which exposes the Data portion of an attachment as a reader that can be streamed. The Attachment type is modified to include a new ContentLength field, to allow the writer to write the attachment length before stream processing the body.
A new option, AttachmentCallback, is exposed in the Lexer. Rather than emitting attachments, the Lexer now parses an attachment reader and calls the supplied callback (if supplied), and consumes any unconsumed bytes from the attachment record (whether or not a callback is supplied).
This is a breaking change in the Go SDK, and introduces some awkwardness due to the presence of the new ContentLength field in the Attachment record. In the specification, that value is folded into the Data field, however this is not a tenable approach for stream handling. The actual binary format is not affected by this - only the way in which the binary format is described.
This change is required for readers and writers to be able to handle attachments bigger than available RAM. Without it, attaching a sufficiently large attachment will crash the reader/writer with an OOM.
Public-Facing Changes
Description