-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add hybrid chunker #68
Conversation
8791265
to
816c779
Compare
Converting to draft until all points planned for this PR are addressed (see description).. |
No direct impact after all, as the method seeks to split |
c6c54d6
to
a9f08e9
Compare
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Bill Murdock <[email protected]> Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Ben Rood <[email protected]> Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
e5f5a08
to
f3064e8
Compare
Signed-off-by: Panos Vagenas <[email protected]>
Main changes with this PR:
👉 Any better naming ideas, particularly for the new chunker, are welcome. Some options in no particular order:
|
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Finally went for |
Known points to fix & improve
Planned for this PR
_merge_chunks_with_matching_metadata()
_split_by_doc_items()
currently assumesDocItem
has a.text
i.e. assumesTextItem
TBD
serialize()
; former should reuse latter