-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: Spanner collection and segments Schemas #6123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ENH]: Spanner collection and segments Schemas #6123
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
The migrations also establish region-scoped compaction cursor and metadata tables, register the new DDL files in the migration manifest, and enforce global uniqueness for segment identifiers while keeping select collection attributes denormalized rather than cascading. Affected Areas• rust/spanner-migrations This summary was automatically generated by @propel-code-bot |
rust/spanner-migrations/migrations/0004-create_collections.spanner.sql
Outdated
Show resolved
Hide resolved
rust/spanner-migrations/migrations/0006-create_collection_compaction_cursors.spanner.sql
Outdated
Show resolved
Hide resolved
rust/spanner-migrations/migrations/0005-create_collections_unique_index.spanner.sql
Outdated
Show resolved
Hide resolved
rust/spanner-migrations/migrations/0005-create_collections_unique_index.spanner.sql
Outdated
Show resolved
Hide resolved
rust/spanner-migrations/migrations/0005-create_collection_compaction_cursors.spanner.sql
Show resolved
Hide resolved
tanujnay112
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index name seems fishy
| ) PRIMARY KEY (collection_id, region), | ||
| INTERLEAVE IN PARENT collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] The INTERLEAVE IN PARENT clause is missing ON DELETE CASCADE. This prevents deleting a collections row if it has associated compaction cursors. If the intention is to allow hard deletes of collections (e.g., via garbage collection), you must manually delete all children first without this clause.
This same pattern applies to migrations/0007-create_segments.spanner.sql (segments in cursors) and migrations/0009-create_collection_metadata.spanner.sql (metadata in collections).
Consider adding ON DELETE CASCADE to all interleaved tables to simplify the deletion lifecycle.
Context for Agents
The `INTERLEAVE IN PARENT` clause is missing `ON DELETE CASCADE`. This prevents deleting a `collections` row if it has associated compaction cursors. If the intention is to allow hard deletes of collections (e.g., via garbage collection), you must manually delete all children first without this clause.
This same pattern applies to `migrations/0007-create_segments.spanner.sql` (segments in cursors) and `migrations/0009-create_collection_metadata.spanner.sql` (metadata in collections).
Consider adding `ON DELETE CASCADE` to all interleaved tables to simplify the deletion lifecycle.
File: rust/spanner-migrations/migrations/0006-create_collection_compaction_cursors.spanner.sql
Line: 20| updated_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true), | ||
| file_paths JSON | ||
| ) PRIMARY KEY (collection_id, region, id), | ||
| INTERLEAVE IN PARENT collection_compaction_cursors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Performance] Interleaving collection_segments in collection_compaction_cursors creates a significant risk of lock contention.
Since collection_compaction_cursors tracks mutable state (like last_compacted_offset), its rows will likely be updated frequently by compaction jobs. Inserting rows into collection_segments requires a lock on the parent row to verify existence. This means ingestion (adding segments) and compaction (updating cursor) will contend for locks on the same cursor row, potentially serializing operations per region.
Recommendation: Interleave collection_segments directly in collections.
With PRIMARY KEY (collection_id, region, id), the segments will still be physically colocated by region (due to lexicographical key ordering), achieving the same query efficiency without coupling ingestion to compaction state stability.
Context for Agents
Interleaving `collection_segments` in `collection_compaction_cursors` creates a significant risk of lock contention.
Since `collection_compaction_cursors` tracks mutable state (like `last_compacted_offset`), its rows will likely be updated frequently by compaction jobs. Inserting rows into `collection_segments` requires a lock on the parent row to verify existence. This means ingestion (adding segments) and compaction (updating cursor) will contend for locks on the same cursor row, potentially serializing operations per region.
**Recommendation:** Interleave `collection_segments` directly in `collections`.
With `PRIMARY KEY (collection_id, region, id)`, the segments will still be physically colocated by region (due to lexicographical key ordering), achieving the same query efficiency without coupling ingestion to compaction state stability.
File: rust/spanner-migrations/migrations/0007-create_segments.spanner.sql
Line: 168943cfb to
118bdf5
Compare
| is_deleted BOOL DEFAULT (false), | ||
| created_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true), | ||
| updated_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true) | ||
| ) PRIMARY KEY (collection_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Reliability] Consider defining Foreign Key constraints for database_id and tenant_id to enforce referential integrity at the database level. Without them, orphaned records could exist if parent entities are deleted (unless strict soft-delete logic is guaranteed at the application layer).
Context for Agents
Consider defining Foreign Key constraints for `database_id` and `tenant_id` to enforce referential integrity at the database level. Without them, orphaned records could exist if parent entities are deleted (unless strict soft-delete logic is guaranteed at the application layer).
File: rust/spanner-migrations/migrations/0004-create_collections.spanner.sql
Line: 13| name STRING(MAX) NOT NULL, | ||
| dimension INT64, | ||
| database_id STRING(MAX) NOT NULL, | ||
| database_name STRING(MAX) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] Storing database_name here creates a denormalized dependency. If a database is renamed in the source-of-truth databases table, this column will become stale unless explicitly synchronized.
Unless this is a specific read-optimization trade-off, consider removing this column and joining with the databases table on database_id when the name is needed.
Context for Agents
Storing `database_name` here creates a denormalized dependency. If a database is renamed in the source-of-truth `databases` table, this column will become stale unless explicitly synchronized.
Unless this is a specific read-optimization trade-off, consider removing this column and joining with the `databases` table on `database_id` when the name is needed.
File: rust/spanner-migrations/migrations/0004-create_collections.spanner.sql
Line: 8118bdf5 to
432297c
Compare
| let mut name_check_stmt = Statement::new( | ||
| "SELECT id FROM databases WHERE name = @name AND tenant_id = @tenant_id AND is_deleted = FALSE", | ||
| ); | ||
| name_check_stmt.add_param("name", &db_name); | ||
| name_check_stmt.add_param("tenant_id", &tenant_id); | ||
|
|
||
| let mut name_iter = tx.query(name_check_stmt).await?; | ||
| if name_iter.next().await?.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] Similar to create_tenant, this logic fails if a database with the same name was previously soft-deleted.
The uniqueness check filters out is_deleted = TRUE rows, but the INSERT will violate the unique index on (tenant_id, name) (and the PK on id if reused) because the index includes soft-deleted rows.
Handle the conflict by checking for existing soft-deleted rows and deciding whether to revive them or return a specific error.
Context for Agents
Similar to `create_tenant`, this logic fails if a database with the same name was previously soft-deleted.
The uniqueness check filters out `is_deleted = TRUE` rows, but the `INSERT` will violate the unique index on `(tenant_id, name)` (and the PK on `id` if reused) because the index includes soft-deleted rows.
Handle the conflict by checking for existing soft-deleted rows and deciding whether to revive them or return a specific error.
File: rust/rust-sysdb/src/spanner.rs
Line: 207|
|
||
| let mut iter = tx.query(check_stmt).await?; | ||
|
|
||
| // If tenant doesn't exist, insert it using DML | ||
| // If tenant doesn't exist, insert it otherwise ignore for idempotency | ||
| // Set last_compaction_time to Unix epoch (0) by default | ||
| if iter.next().await?.is_none() { | ||
| let mut insert_stmt = Statement::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] This creates a potential bug where creating a tenant that was previously soft-deleted will fail with a database constraint violation (PK violation) instead of handling it gracefully.
The check is_deleted = FALSE ignores soft-deleted rows, but the subsequent INSERT will collide with the existing soft-deleted row's primary key id.
Consider checking for the row existence regardless of is_deleted status, and then either reviving the tenant (UPDATE is_deleted = false) or returning a specific error.
Context for Agents
This creates a potential bug where creating a tenant that was previously soft-deleted will fail with a database constraint violation (PK violation) instead of handling it gracefully.
The check `is_deleted = FALSE` ignores soft-deleted rows, but the subsequent `INSERT` will collide with the existing soft-deleted row's primary key `id`.
Consider checking for the row existence regardless of `is_deleted` status, and then either reviving the tenant (UPDATE `is_deleted` = false) or returning a specific error.
File: rust/rust-sysdb/src/spanner.rs
Line: 92| .map_err(SysDbError::FailedToReadColumn)?; | ||
|
|
||
| // resource_name can be NULL, so we handle the error as None | ||
| let resource_name: Option<String> = row.column_by_name("resource_name").ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Logic] Using .ok() here suppresses all errors, including schema mismatches (e.g., if the column is missing or has the wrong type), treating them as None.
The Spanner client's column_by_name handles NULL values correctly when the target type is Option<T>. You should propagate errors to detect schema drift or implementation bugs.
| let resource_name: Option<String> = row.column_by_name("resource_name").ok(); | |
| let resource_name: Option<String> = row | |
| .column_by_name("resource_name") | |
| .map_err(SysDbError::FailedToReadColumn)?; |
Context for Agents
Using `.ok()` here suppresses all errors, including schema mismatches (e.g., if the column is missing or has the wrong type), treating them as `None`.
The Spanner client's `column_by_name` handles `NULL` values correctly when the target type is `Option<T>`. You should propagate errors to detect schema drift or implementation bugs.
```suggestion
let resource_name: Option<String> = row
.column_by_name("resource_name")
.map_err(SysDbError::FailedToReadColumn)?;
```
File: rust/types/src/tenant.rs
Line: 34432297c to
50a6033
Compare

Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
None
Observability plan
None
Documentation Changes
None