-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH]: Plumb prefix path all the way to the bf writer #4743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 05-30-_enh_return_database_id_in_get_collections_call_from_sysdb
Are you sure you want to change the base?
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Add Explicit Blockfile Prefix Path Plumbed from Tenant/Database to Segment Writers This PR refactors the blockfile writer creation path throughout the Chroma codebase, ensuring an explicit and consistent prefix path is constructed and passed to all blockfile writers. The prefix is now composed with tenant, database ID, collection ID, and segment ID, which enables clearer S3/object store layout by partitioning files by tenant/db. This change updates constructors, signatures, and call chains for segment writers, compactor orchestrators, blockfile providers, and all associated tests, enforcing the prefix as required context. Key Changes: Affected Areas: Potential Impact: Functionality: No change to the logical behavior of segment writers; improved ability to partition/locate blockfiles by tenant/db/collection/segment; potential for cleaner deletes and better S3/object store organization. Performance: No direct impact; minor overhead in constructing and passing prefixes, possible improvement in cleanup and batch operations. Security: More explicit separation of tenant and database storage, reducing accidental sharing or key collisions. Scalability: Enables better scaling for multi-tenant systems by partitioning files and isolating data blobs; prepares base for sharding and cleaner data management. Review Focus: Testing Needed• All segment writers, orchestrators, and test suites should pass with the new required Code Quality Assessmentrust/blockstore/src/types/writer_options.rs: Enforces explicit and non-default construction; ensures all call sites must opt-in and acknowledge use of prefix_path. rust/segment/src/blockfile_{record,metadata}.rs: Updated methods and struct signatures to require tenant/database_id; only forwards context, no logic change. rust/types/src/collection.rs: Introduces DatabaseUuid as a dedicated typed ID; follows best practice for identifier separation. tests/benchmarks: Fixture and synthetic data providers now build test segments with new prefix and database id. Best PracticesMulti Tenancy: Constructor Design: Potential Issues• Callers not updated to provide prefix_path will fail at compile time (intentional, expected enforcement). This summary was automatically generated by @propel-code-bot |
} | ||
|
||
impl BlockfileWriterOptions { | ||
pub fn new() -> Self { | ||
Self::default() | ||
pub fn new(prefix_path: String) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intentional to force all callers to necessarily think about and set a prefix path. There is no default constructor now.
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
None