Resolves #355: Support grabbing the bytes of intervals #364

phadej · 2025-08-13T18:40:00Z

No description provided.

phadej · 2025-08-13T18:56:13Z

I couldn't run macro benchmarks (#362), there are no differences in micro benchmarks

dcoutts

Thoughts so far.

Generally this looks very promising: providing the feature at low cost to non-users.

dcoutts · 2025-08-14T16:57:11Z

cborg/src/Codec/CBOR/Read.hs


    SlowConsumeTokenString bs' k len -> do
-      (bstr, bs'') <- getTokenVarLen len bs' offset'
+      (bstr, bs'', marks') <- getTokenVarLen len bs' (offset' + intToInt64 (BS.length bs')) marks


I don't understand yet why we need to change the offset calculation here.

If I've followed correctly then:

offset' = offset + intToInt64 (BS.length bs - BS.length bs') offset'' = offset' + intToInt64 (BS.length bs') = offset + intToInt64 (BS.length bs)

(which might therefore be a simpler definition)

But I don't see why it's the right offset.

offset' is a the beginning of bs''.

getTokenVarLen expect offset to be after, thus + BS.length bs'.

The getTokenVarLen precondition is in its comments.

I can change offset'' = offset + intToInt64 (BS.length bs), say so.

Why we need to change offset in the first place? Because getTokenVarLen didn't use it for anything else than error reporting, so it was "wrong".

Got it. Thanks. And thanks for pushing the slight simplification.

dcoutts · 2025-08-14T17:01:28Z

cborg/src/Codec/CBOR/Read.hs

      case mbs of
        Nothing   -> decodeFail bs' offset' "end of input"
-        Just bs'' -> go_slow da' bs'' offset'
+        Just bs'' -> go_slow da' bs'' offset' (slowMarkChunk bs'' offset' marks)


Ok so here we accumulate the input chunk, but only if we have any marks. So here is where everyone pays a cost even if they're not using the feature, but it's very minimal and only on chunk boundaries.

Well, I doubt there's any noticeable cost cause

slowMarkChunk :: ByteString -> ByteOffset -> Marks -> Marks slowMarkChunk _ _ NoMarks = NoMarks

My gut feeling is there's more accumulated cost of threading an additional NoMarks argument through the interpreter.

Indeed. And as you noted, the micro-benchmarks show nothing measureable. And I fixed the macro-benchmarks and they also show nothing measurable.

cborg/src/Codec/CBOR/Read.hs

dcoutts · 2025-08-14T17:06:53Z

cborg/src/Codec/CBOR/Read.hs

-getTokenVarLenSlow :: [ByteString] -> Int -> ByteOffset
-                   -> IncrementalDecoder s (ByteString, ByteString)
-getTokenVarLenSlow bss n offset = do
+                         (offset + intToInt64 (BS.length bs'))


I've not followed why we're changing the offset calculations here, or is it just shuffling around where we do it, from caller to callee?

I've not followed why we're changing the offset calculations here

Because getTokenVarLen/Slow didn't use offset for anything "functional", only to report an error at. Previously (as you can see from the diff) it wasn't updated at all even after new chunks were read.

A side-effect of this change is that "end-of-input" error will now report the actualy end-of-input offset, previously it reported an offset at which getTokenVarLen was called.

phadej · 2025-08-15T01:11:17Z

cborg/tests/Tests/ByteOffset.hs

+    res = either throw snd (deserialiseFromBytes emptyDecoder (LBS.pack [0]))
+
+empty_deserialise_fail :: Bool
+empty_deserialise_fail = isLeft (deserialiseFromBytes emptyDecoder LBS.empty)


I added these tests for both peekByteOffset and getInputSpan.

I find it a bit surprising as peekByteOffset as well as unmarkInput and getInputSpan do succeed at the end of input, but they seem to fail at the very beginning of an empty input.

This is quite obscure corner case, but nevertheless I think it's good to be aware off (i.e. have test for it).

Thanks for these tests. Yes it is obscure and I doubt it's intentional. Perhaps we should note here more clearly that although this is the current behaviour, it's not necessarily ideal and could be reviewed and changed to something more regular.

dcoutts

This is great. It adds the feature nicely, and doesn't add any cost for users not using it. And indeed using the feature itself is pretty cheap (as measured by a macro-benchmark).

I've taken the liberty of doing some renaming. And I'll add more API docs. I'll merge once I've finished adding docs.

dcoutts · 2025-09-15T14:09:58Z

cborg/tests/Tests/ByteOffset.hs

+    res = either throw snd (deserialiseFromBytes emptyDecoder (LBS.pack [0]))
+
+empty_deserialise_fail :: Bool
+empty_deserialise_fail = isLeft (deserialiseFromBytes emptyDecoder LBS.empty)


Thanks for these tests. Yes it is obscure and I doubt it's intentional. Perhaps we should note here more clearly that although this is the current behaviour, it's not necessarily ideal and could be reviewed and changed to something more regular.

dcoutts · 2025-10-02T11:35:28Z

cborg/src/Codec/CBOR/Read.hs


    SlowConsumeTokenString bs' k len -> do
-      (bstr, bs'') <- getTokenVarLen len bs' offset'
+      (bstr, bs'', marks') <- getTokenVarLen len bs' (offset' + intToInt64 (BS.length bs')) marks


Got it. Thanks. And thanks for pushing the slight simplification.

dcoutts · 2025-10-02T11:36:34Z

cborg/src/Codec/CBOR/Read.hs

      case mbs of
        Nothing   -> decodeFail bs' offset' "end of input"
-        Just bs'' -> go_slow da' bs'' offset'
+        Just bs'' -> go_slow da' bs'' offset' (slowMarkChunk bs'' offset' marks)


Indeed. And as you noted, the micro-benchmarks show nothing measureable. And I fixed the macro-benchmarks and they also show nothing measurable.

phadej · 2025-10-02T14:52:00Z

cborg/src/Codec/CBOR/Decoding.hs

 -- > openByteSpan
 -- > x <- decode
-- > !after  <- peekByteSpan
+-- > bytes <- peekByteSpan


you removed bang here, but not in the implementation.

Thanks. I think the bang there is unnecessary and it's better to force the call of peekMarkedByteSpan as it is passed to the continuation. I'll change that.

Change: markInput --> openByteSpan unmarkInput --> closeByteSpan getInputSpan --> peekByteSpan

Better to do it in the reader.

phadej requested a review from dcoutts August 13, 2025 18:40

phadej force-pushed the issue-355-grab-contents branch from c5e3fc5 to 60e1a8c Compare August 14, 2025 13:42

dcoutts reviewed Aug 14, 2025

View reviewed changes

phadej commented Aug 15, 2025

View reviewed changes

phadej force-pushed the issue-355-grab-contents branch from 01cf105 to 00f95af Compare August 15, 2025 12:12

dcoutts approved these changes Oct 2, 2025

View reviewed changes

Resolves #355: Support grabbing the bytes of intervals

9a0b746

dcoutts force-pushed the issue-355-grab-contents branch from cc0094d to c0d86d7 Compare October 2, 2025 14:45

phadej commented Oct 2, 2025

View reviewed changes

dcoutts added 3 commits October 2, 2025 22:37

Change names of marks/input span to use "byte span" based names

52b68d5

Change: markInput --> openByteSpan unmarkInput --> closeByteSpan getInputSpan --> peekByteSpan

Add docs for new byte span API

45a1a7c

Adjust where we evaluate selecting out the marked byte span.

4d47bd5

Better to do it in the reader.

dcoutts force-pushed the issue-355-grab-contents branch from c0d86d7 to 4d47bd5 Compare October 2, 2025 21:38

dcoutts merged commit 72a0e73 into master Oct 2, 2025
15 checks passed

Resolves #355: Support grabbing the bytes of intervals #364

Resolves #355: Support grabbing the bytes of intervals #364

Uh oh!

Conversation

phadej commented Aug 13, 2025

Uh oh!

phadej commented Aug 13, 2025

Uh oh!

dcoutts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phadej Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcoutts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phadej Aug 14, 2025 •

edited

Loading