Skip to content

ByteBuffer and File partitioners need to be reliable for certain streaming modes #8

Open
@jshook

Description

@jshook

The byte buffer chunking logic which presumed jsonl structured input should not.
Essentially the file partitioner needs to be aware of whether it is reading json streams or jsonl streams, and it should only resort to optimal partitioning logic in the case that the input is designated as JSONL. Otherwise, a separate thread should be used to scan and parse the object stream from the beginning.

Alternately, JsonNode stream processing can be used to derive offsets on or before chunking gaps. It would still be useful, for very large files, to have valid chunking offsets provided for streams of independent objects. It would also be useful to walk into a level of object structure from the outside of a large object which contained logical object streams internally, whether as arrays or as values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions