Skip to content

[ENH]: Return database id in get collections call from sysdb #4686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented May 30, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Organizing data of collections in s3 by s3 prefixes requires knowledge about database id in the write path. This PR populates and returns database id from sysdb in get_collections rpc
    • We skip serializing and returning this from FE to client
    • Even though the collection model is the same for both local and distributed, we don't set it in local currently. However, it can be done if needed
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

sanketkedia commented May 30, 2025

@sanketkedia sanketkedia requested a review from HammadB May 30, 2025 19:54
@sanketkedia sanketkedia marked this pull request as ready for review May 30, 2025 19:54
Copy link
Contributor

propel-code-bot bot commented May 30, 2025

Expose Database ID in Collection Metadata from sysdb (GetCollections RPC)

This PR extends the sysdb (system database) service and associated codepaths to include the database ID (UUID) in each collection's metadata returned by the get_collections RPC. Changes are made across the protocol buffers, Rust collection types, sysdb implementations, and Go sysdb gRPC translation layers so that the database UUID is available for backend services (not forwarded to the end-user frontend). This will facilitate use cases where operations on collections need to be aware of the associated database UUID (e.g., S3 prefixing for organization in storage).

The work introduces the new database_id field in the proto, ensures correct population and round-trip in the Rust model, updates the sysdb/sqlite/go implementations to handle the new field, and adjusts related tests and conversion logic for correctness. In local/chroma single-node mode, the field is set to a new UUID but can be updated later as needed.

Key Changes:
• IDL: Added string database_id = 16; to Collection message in chroma.proto; documentation and comments updated.
• Rust: Added and populated DatabaseUuid field in Collection struct, updated conversion logic (TryFrom/TryInto) and test coverage for this field.
• Sysdb: Updated get_collections and get_collection_with_segments codepaths to handle the new database_id field (in SQLite and Rust sysdb logic), maintaining backwards compatibility.
• Go: sysdb gRPC proto/model conversion code updated to pass through the database id.
• Worker: Test stubs and proto collection construction logic updated to populate database_id.
• Removed dead code from helper.rs and made minor import/struct cleanups.

Affected Areas:
• Protocol buffers (Collection schema)
• Rust core types (Collection, DatabaseUuid)
• Rust sysdb (sqlite, sysdb.rs)
• Go sysdb gRPC translation
• Test coverage for conversion and presence of database_id
• Worker RPC handler test logic

This summary was automatically generated by @propel-code-bot

@@ -163,73 +162,6 @@ impl ChromaGrpcClients {
}
}

#[allow(dead_code)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was dead code

let collection_metadata: Option<Metadata> = match proto_collection.metadata {
Some(proto_metadata) => match proto_metadata.try_into() {
Ok(metadata) => Some(metadata),
Err(e) => return Err(CollectionConversionError::MetadataValueConversionError(e)),
},
None => None,
};
let database_id = DatabaseUuid::from_str(&proto_collection.database_id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle upgrade scenario such as

  1. sysdb is on an older version and does not populate this field
  2. Query/compactor is on a newer version and tries to parse uuid and crashes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant