-
Notifications
You must be signed in to change notification settings - Fork 91
Resolves #355: Support grabbing the bytes of intervals #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I couldn't run macro benchmarks (#362), there are no differences in micro benchmarks |
c5e3fc5 to
60e1a8c
Compare
dcoutts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts so far.
Generally this looks very promising: providing the feature at low cost to non-users.
|
|
||
| SlowConsumeTokenString bs' k len -> do | ||
| (bstr, bs'') <- getTokenVarLen len bs' offset' | ||
| (bstr, bs'', marks') <- getTokenVarLen len bs' (offset' + intToInt64 (BS.length bs')) marks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand yet why we need to change the offset calculation here.
If I've followed correctly then:
offset' = offset + intToInt64 (BS.length bs - BS.length bs')
offset'' = offset' + intToInt64 (BS.length bs')
= offset + intToInt64 (BS.length bs)
(which might therefore be a simpler definition)
But I don't see why it's the right offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset'is a the beginning ofbs''.getTokenVarLenexpect offset to be after, thus+ BS.length bs'.
The getTokenVarLen precondition is in its comments.
I can change offset'' = offset + intToInt64 (BS.length bs), say so.
Why we need to change offset in the first place? Because getTokenVarLen didn't use it for anything else than error reporting, so it was "wrong".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. And thanks for pushing the slight simplification.
cborg/src/Codec/CBOR/Read.hs
Outdated
| case mbs of | ||
| Nothing -> decodeFail bs' offset' "end of input" | ||
| Just bs'' -> go_slow da' bs'' offset' | ||
| Just bs'' -> go_slow da' bs'' offset' (slowMarkChunk bs'' offset' marks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so here we accumulate the input chunk, but only if we have any marks. So here is where everyone pays a cost even if they're not using the feature, but it's very minimal and only on chunk boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I doubt there's any noticeable cost cause
slowMarkChunk :: ByteString -> ByteOffset -> Marks -> Marks
slowMarkChunk _ _ NoMarks = NoMarksMy gut feeling is there's more accumulated cost of threading an additional NoMarks argument through the interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. And as you noted, the micro-benchmarks show nothing measureable. And I fixed the macro-benchmarks and they also show nothing measurable.
| getTokenVarLenSlow :: [ByteString] -> Int -> ByteOffset | ||
| -> IncrementalDecoder s (ByteString, ByteString) | ||
| getTokenVarLenSlow bss n offset = do | ||
| (offset + intToInt64 (BS.length bs')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not followed why we're changing the offset calculations here, or is it just shuffling around where we do it, from caller to callee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not followed why we're changing the offset calculations here
Because getTokenVarLen/Slow didn't use offset for anything "functional", only to report an error at. Previously (as you can see from the diff) it wasn't updated at all even after new chunks were read.
A side-effect of this change is that "end-of-input" error will now report the actualy end-of-input offset, previously it reported an offset at which getTokenVarLen was called.
| res = either throw snd (deserialiseFromBytes emptyDecoder (LBS.pack [0])) | ||
|
|
||
| empty_deserialise_fail :: Bool | ||
| empty_deserialise_fail = isLeft (deserialiseFromBytes emptyDecoder LBS.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these tests for both peekByteOffset and getInputSpan.
I find it a bit surprising as peekByteOffset as well as unmarkInput and getInputSpan do succeed at the end of input, but they seem to fail at the very beginning of an empty input.
This is quite obscure corner case, but nevertheless I think it's good to be aware off (i.e. have test for it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these tests. Yes it is obscure and I doubt it's intentional. Perhaps we should note here more clearly that although this is the current behaviour, it's not necessarily ideal and could be reviewed and changed to something more regular.
01cf105 to
00f95af
Compare
dcoutts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. It adds the feature nicely, and doesn't add any cost for users not using it. And indeed using the feature itself is pretty cheap (as measured by a macro-benchmark).
I've taken the liberty of doing some renaming. And I'll add more API docs. I'll merge once I've finished adding docs.
| res = either throw snd (deserialiseFromBytes emptyDecoder (LBS.pack [0])) | ||
|
|
||
| empty_deserialise_fail :: Bool | ||
| empty_deserialise_fail = isLeft (deserialiseFromBytes emptyDecoder LBS.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these tests. Yes it is obscure and I doubt it's intentional. Perhaps we should note here more clearly that although this is the current behaviour, it's not necessarily ideal and could be reviewed and changed to something more regular.
|
|
||
| SlowConsumeTokenString bs' k len -> do | ||
| (bstr, bs'') <- getTokenVarLen len bs' offset' | ||
| (bstr, bs'', marks') <- getTokenVarLen len bs' (offset' + intToInt64 (BS.length bs')) marks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. And thanks for pushing the slight simplification.
cborg/src/Codec/CBOR/Read.hs
Outdated
| case mbs of | ||
| Nothing -> decodeFail bs' offset' "end of input" | ||
| Just bs'' -> go_slow da' bs'' offset' | ||
| Just bs'' -> go_slow da' bs'' offset' (slowMarkChunk bs'' offset' marks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. And as you noted, the micro-benchmarks show nothing measureable. And I fixed the macro-benchmarks and they also show nothing measurable.
cc0094d to
c0d86d7
Compare
| -- > openByteSpan | ||
| -- > x <- decode | ||
| -- > !after <- peekByteSpan | ||
| -- > bytes <- peekByteSpan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed bang here, but not in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think the bang there is unnecessary and it's better to force the call of peekMarkedByteSpan as it is passed to the continuation. I'll change that.
Change: markInput --> openByteSpan unmarkInput --> closeByteSpan getInputSpan --> peekByteSpan
Better to do it in the reader.
c0d86d7 to
4d47bd5
Compare
No description provided.