POC: Use park_cursor to leave cursor at document end #31

smheidrich · 2022-12-25T17:29:21Z

Proof of concept to illustrate how to use RustTokenizer's proposed park_cursor method¹ to avoid the "overconsumption" issue from #30 / smheidrich/py-json-stream-rs-tokenizer#47. Feel free to edit or open a new PR altogether with better ideas, this is really just to show how to use it.

There is probably a more elegant solution that doesn't involve the introduction of the level param everywhere just to know when the top-level document ends, but I don't grok the code well enough come up with one. E.g. I initially thought it would be possible to just alter load like

def load(fp, ..., tokenizer=...):
    token_stream = tokenizer(fp)
    ...
    base = StreamingJSONBase.factory(token, token_stream, persistent)
    for thing in base:
        yield thing
    if getattr(token_stream, "park_cursor", None):
        token_stream.park_cursor()

which is similar to how you did it in the minimal example in smheidrich/py-json-stream-rs-tokenizer#47, but I guess that wouldn't work because people could no longer call e.g. persistent on the immediate result of StreamJSONBase.factory...

In any case, this requires smheidrich/py-json-stream-rs-tokenizer#50 to be merged to actually work but I think it makes sense to only merge that once everything has been decided on on this end.

¹ Still open to better names or turning it into a context manager __exit__ if that makes sense.

Proof of concept only. Maybe there is a more elegant solution that doesn't require putting `level` everywhere to find out when the top-level document ends. Requires smheidrich/py-json-stream-rs-tokenizer#50 to be merged to actually work.

smheidrich mentioned this pull request Dec 25, 2022

Input stream unusable after reading a JSON document smheidrich/py-json-stream-rs-tokenizer#47

Closed

smheidrich mentioned this pull request Oct 25, 2025

Allow buffered reading for non-seekable streams smheidrich/py-json-stream-rs-tokenizer#125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Use park_cursor to leave cursor at document end #31

POC: Use park_cursor to leave cursor at document end #31

Uh oh!

smheidrich commented Dec 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

POC: Use park_cursor to leave cursor at document end #31

Are you sure you want to change the base?

POC: Use park_cursor to leave cursor at document end #31

Uh oh!

Conversation

smheidrich commented Dec 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant