Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): bump text-splitter from 0.4.4 to 0.6.0 #88

Closed
wants to merge 1 commit into from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jan 15, 2024

Bumps text-splitter from 0.4.4 to 0.6.0.

Release notes

Sourced from text-splitter's releases.

v0.6.0

Breaking Changes

  • Chunk behavior should now be the same as prior to v0.5.0. Once binary search finds the optimal chunk, we now check the next few sections as long as the chunk size doesn't change. This should result in the same behavior as before, but with the performance improvements of binary search. @​benbrandt in benbrandt/text-splitter#81

Full Changelog: benbrandt/text-splitter@v0.5.1...v0.6.0

v0.5.1

What's New

  • Python bindings and Rust crate now have the same version number.

Rust

  • Constructors for ChunkSize are now public, so you can more easily create your own ChunkSize structs for your own custom ChunkSizer implementation.

Python

Full Changelog: benbrandt/text-splitter@v0.5.0...v0.5.1

v0.5.0

What's New

Breaking Changes

  • Minimum required version of tokenizers is now 0.15.0
  • Minimum required version of tiktoken-rs is now 0.5.6
  • Due to using binary search, there are some slight differences at the edges of chunks where the algorithm was a little greedier before. If two candidates would tokenize to the same amount of tokens that fit within the capacity, it will now choose the shorter text. Due to the nature of of tokenizers, this happens more often with whitespace at the end of a chunk, and rarely effects users who have set with_trim_chunks(true). It is a tradeoff, but would have made the binary search code much more complicated to keep the exact same behavior.
  • The chunk_size method on ChunkSizer now needs to accept a ChunkCapacity argument, and return a ChunkSize struct instead of a usize. This was to help support the new binary search method in chunking, and should only affect users who implemented custom ChunkSizers and weren't using one of the provided ones.
    • New signature: fn chunk_size(&self, chunk: &str, capacity: &impl ChunkCapacity) -> ChunkSize;

Full Changelog: benbrandt/text-splitter@v0.4.5...v0.5.0

v0.4.5

What's Changed

  • Support tokenizers crate v0.15.0
  • Minimum Supported Rust Version is now 1.65.0

New Contributors

Full Changelog: benbrandt/text-splitter@v0.4.4...v0.4.5

Changelog

Sourced from text-splitter's changelog.

v0.6.0

Breaking Changes

  • Chunk behavior should now be the same as prior to v0.5.0. Once binary search finds the optimal chunk, we now check the next few sections as long as the chunk size doesn't change. This should result in the same behavior as before, but with the performance improvements of binary search.

v0.5.1

What's New

  • Python bindings and Rust crate now have the same version number.

Rust

  • Constructors for ChunkSize are now public, so you can more easily create your own ChunkSize structs for your own custom ChunkSizer implementation.

Python

  • New CustomTextSplitter that accepts a custom callback with the signature of (str) -> int. Allows for custom chunk sizing on the Python side.

v0.5.0

What's New

  • Significant performance improvements for generating chunks with the tokenizers or tiktoken-rs crates by applying binary search when attempting to find the next matching chunk size.

Breaking Changes

  • Minimum required version of tokenizers is now 0.15.0
  • Minimum required version of tiktoken-rs is now 0.5.6
  • Due to using binary search, there are some slight differences at the edges of chunks where the algorithm was a little greedier before. If two candidates would tokenize to the same amount of tokens that fit within the capacity, it will now choose the shorter text. Due to the nature of of tokenizers, this happens more often with whitespace at the end of a chunk, and rarely effects users who have set with_trim_chunks(true). It is a tradeoff, but would have made the binary search code much more complicated to keep the exact same behavior.
  • The chunk_size method on ChunkSizer now needs to accept a ChunkCapacity argument, and return a ChunkSize struct instead of a usize. This was to help support the new binary search method in chunking, and should only affect users who implemented custom ChunkSizers and weren't using one of the provided ones.
    • New signature: fn chunk_size(&self, chunk: &str, capacity: &impl ChunkCapacity) -> ChunkSize;

v0.4.5

What's New

  • Support tokenizers crate v0.15.0
  • Minimum Supported Rust Version is now 1.65.0
Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Jan 15, 2024
Bumps [text-splitter](https://github.com/benbrandt/text-splitter) from 0.4.4 to 0.6.0.
- [Release notes](https://github.com/benbrandt/text-splitter/releases)
- [Changelog](https://github.com/benbrandt/text-splitter/blob/main/CHANGELOG.md)
- [Commits](benbrandt/text-splitter@v0.4.4...v0.6.0)

---
updated-dependencies:
- dependency-name: text-splitter
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/cargo/text-splitter-0.6.0 branch from 8d77804 to 1ece01e Compare January 17, 2024 20:50
Copy link
Contributor Author

dependabot bot commented on behalf of github Jan 22, 2024

Superseded by #96.

@dependabot dependabot bot closed this Jan 22, 2024
@dependabot dependabot bot deleted the dependabot/cargo/text-splitter-0.6.0 branch January 22, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants