-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The current implementation will only allow buffered reading when streams are seekable. This means use-cases such as streaming large JSON responses from an HTTP request will be extremely slow as it reads one byte at a time due to the Python overhead. In this case, even the native Python tokenizer is faster than the native Rust implementation.
The correct_cursor parameter is set to true by default and not settable through the Python API. As a result, it will always use unbuffered input, even when explicitly setting the hidden buffering argument, when the stream isn't seekable. In this case the only option is to create a custom RawIOBase implementation that implements some minimal amount of seeking to allow it to create a buffered reader. With one test case, adding seeking functions resulted in ~8x speedup. When using a debugger, neither seek() nor tell() were called.