ByteBuffer and File partitioners need to be reliable for certain streaming modes

The byte buffer chunking logic which presumed jsonl structured input should not.
Essentially the file partitioner needs to be aware of whether it is reading json streams or jsonl streams, and it should only resort to optimal partitioning logic in the case that the input is designated as JSONL. Otherwise, a separate thread should be used to scan and parse the object stream from the beginning.

Alternately, JsonNode stream processing can be used to derive offsets on or before chunking gaps. It would still be useful, for very large files, to have valid chunking offsets provided for streams of independent objects. It would also be useful to walk into a level of object structure from the outside of a large object which contained logical object streams internally, whether as arrays or as values.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ByteBuffer and File partitioners need to be reliable for certain streaming modes #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ByteBuffer and File partitioners need to be reliable for certain streaming modes #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions