-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[CHORE]: Remove nonce-related code outside s3heap #5866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor_compactor
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
|
|
Purge Nonce Plumbing from Core Services, Limiting It to s3heap Internals This PR completely removes the Key Changes• Dropped Affected Areas• Database schema & migrations This summary was automatically generated by @propel-code-bot |
c8e5cb7 to
3417b01
Compare
| suite.NotNil(taskResp) | ||
| originalTaskID := taskResp.Id | ||
| suite.T().Logf("Created fully initialized task: %s", originalTaskID) | ||
|
|
||
| // STEP 2: Directly UPDATE database to make task partial (simulate Phase 3 failure) | ||
| // Set lowest_live_nonce = NULL to simulate the task being stuck | ||
| _, err = db.Exec(`UPDATE public.tasks SET lowest_live_nonce = NULL WHERE task_id = $1`, originalTaskID) | ||
| suite.NoError(err, "Should be able to corrupt task in database") | ||
| suite.T().Logf("Made task partial by setting lowest_live_nonce = NULL") | ||
| // TODO: Uncomment after proto regeneration | ||
| // originalTaskID := taskResp.Id | ||
| suite.T().Skip("Test requires proto regeneration for AttachFunctionResponse.Id field") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Test skip without proper validation: The test is skipped unconditionally with suite.T().Skip() immediately after creating the task response, making the entire test setup meaningless. Either remove the test or implement proper proto field validation before skipping.
// Bad - unconditional skip after setup
suite.NotNil(taskResp)
// TODO: Uncomment after proto regeneration
// originalTaskID := taskResp.Id
suite.T().Skip("Test requires proto regeneration for AttachFunctionResponse.Id field")
// Better - skip early or add validation
if taskResp.Id == "" {
suite.T().Skip("Test requires proto regeneration")
return
}
originalTaskID := taskResp.Id
// ... continue testContext for Agents
[**BestPractice**]
**Test skip without proper validation**: The test is skipped unconditionally with `suite.T().Skip()` immediately after creating the task response, making the entire test setup meaningless. Either remove the test or implement proper proto field validation before skipping.
```go
// Bad - unconditional skip after setup
suite.NotNil(taskResp)
// TODO: Uncomment after proto regeneration
// originalTaskID := taskResp.Id
suite.T().Skip("Test requires proto regeneration for AttachFunctionResponse.Id field")
// Better - skip early or add validation
if taskResp.Id == "" {
suite.T().Skip("Test requires proto regeneration")
return
}
originalTaskID := taskResp.Id
// ... continue test
```
File: go/pkg/sysdb/coordinator/heap_client_integration_test.go
Line: 268| /// Note: This service is currently not fully functional due to nonce removal | ||
| pub async fn entrypoint() { | ||
| eprintln!("Heap tender service is not currently implemented"); | ||
| eprintln!("The heap scheduling functionality was removed"); | ||
| std::process::exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Dead code after unconditional process exit: The entrypoint() function always exits with std::process::exit(1), making it impossible to return normally. This prevents proper cleanup and makes the function signature (async fn) misleading.
// Current - unreachable return
pub async fn entrypoint() {
eprintln!("Heap tender service is not currently implemented");
eprintln!("The heap scheduling functionality was removed");
std::process::exit(1);
} // implicit return is unreachable
// Better - explicit never return or remove async
pub fn entrypoint() -> ! {
eprintln!("Heap tender service is not currently implemented");
eprintln!("The heap scheduling functionality was removed");
std::process::exit(1)
}Context for Agents
[**BestPractice**]
**Dead code after unconditional process exit**: The `entrypoint()` function always exits with `std::process::exit(1)`, making it impossible to return normally. This prevents proper cleanup and makes the function signature (`async fn`) misleading.
```rust
// Current - unreachable return
pub async fn entrypoint() {
eprintln!("Heap tender service is not currently implemented");
eprintln!("The heap scheduling functionality was removed");
std::process::exit(1);
} // implicit return is unreachable
// Better - explicit never return or remove async
pub fn entrypoint() -> ! {
eprintln!("Heap tender service is not currently implemented");
eprintln!("The heap scheduling functionality was removed");
std::process::exit(1)
}
```
File: rust/s3heap-service/src/lib.rs
Line: 516| pub async fn tend_to_heap(&self) -> Result<(), Error> { | ||
| let (witness, cursor, tended) = self.read_and_coalesce_dirty_log().await?; | ||
| if !tended.is_empty() { | ||
| let collection_ids = tended.iter().map(|t| t.0).collect::<Vec<_>>(); | ||
| let scheduled = self | ||
| .sysdb | ||
| .clone() | ||
| .peek_schedule_by_collection_id(&collection_ids) | ||
| .await?; | ||
| let triggerables: Vec<Option<Schedule>> = scheduled | ||
| .into_iter() | ||
| .map(|s: ScheduleEntry| -> Result<_, Error> { | ||
| let triggerable = Triggerable { | ||
| partitioning: s3heap::UnitOfPartitioningUuid::new(s.collection_id.0), | ||
| scheduling: s3heap::UnitOfSchedulingUuid::new(s.attached_function_id), | ||
| }; | ||
| if let Some(next_scheduled) = s.when_to_run { | ||
| let schedule = Schedule { | ||
| triggerable, | ||
| next_scheduled, | ||
| nonce: s.attached_function_run_nonce.0, | ||
| }; | ||
| Ok(Some(schedule)) | ||
| } else { | ||
| Ok(None) | ||
| } | ||
| }) | ||
| .collect::<Result<Vec<_>, _>>()?; | ||
| let triggerables: Vec<Schedule> = triggerables.into_iter().flatten().collect(); | ||
| if !triggerables.is_empty() { | ||
| self.writer.push(&triggerables).await?; | ||
| } | ||
| } | ||
| if let Some(witness) = witness.as_ref() { | ||
| self.cursor | ||
| .save(&HEAP_TENDER_CURSOR_NAME, &cursor, witness) | ||
| .await?; | ||
| } else { | ||
| self.cursor | ||
| .init(&HEAP_TENDER_CURSOR_NAME, cursor.clone()) | ||
| .await?; | ||
| } | ||
| Ok(()) | ||
| Err(Error::Internal("Not implemented".to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Unimplemented function returns error instead of using proper Rust patterns: The tend_to_heap() function immediately returns an error string instead of using unimplemented!() or todo!() macros, which provide better stack traces and clearer intent.
// Current - generic error message
pub async fn tend_to_heap(&self) -> Result<(), Error> {
Err(Error::Internal("Not implemented".to_string()))
}
// Better - explicit unimplemented with context
pub async fn tend_to_heap(&self) -> Result<(), Error> {
unimplemented!("tend_to_heap removed after nonce removal - see TODO(tanujnay112)")
}Context for Agents
[**BestPractice**]
**Unimplemented function returns error instead of using proper Rust patterns**: The `tend_to_heap()` function immediately returns an error string instead of using `unimplemented!()` or `todo!()` macros, which provide better stack traces and clearer intent.
```rust
// Current - generic error message
pub async fn tend_to_heap(&self) -> Result<(), Error> {
Err(Error::Internal("Not implemented".to_string()))
}
// Better - explicit unimplemented with context
pub async fn tend_to_heap(&self) -> Result<(), Error> {
unimplemented!("tend_to_heap removed after nonce removal - see TODO(tanujnay112)")
}
```
File: rust/s3heap-service/src/lib.rs
Line: 249| // Finish marks work as complete | ||
| func (s *attachedFunctionDb) Finish(id uuid.UUID) error { | ||
| now := time.Now() | ||
| result := s.db.Model(&dbmodel.AttachedFunction{}). | ||
| Where("id = ?", id). | ||
| Where("is_deleted = false"). | ||
| Where("lowest_live_nonce IS NULL"). // Only update if still NULL (2PC marker) | ||
| UpdateColumns(map[string]interface{}{ | ||
| "lowest_live_nonce": lowestLiveNonce, | ||
| "updated_at": now, | ||
| "updated_at": now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Finish method always returns success without validation: The Finish() method updates the database but doesn't verify that the attached function exists or is in a valid state before updating. If RowsAffected == 0, it returns ErrAttachedFunctionNotFound, but there's no validation that the attached function is actually ready to be finished (e.g., not already deleted, not in an invalid state).
// Current - minimal validation
result := s.db.Model(&dbmodel.AttachedFunction{}).
Where("id = ?", id).
Where("is_deleted = false").
UpdateColumns(map[string]interface{}{
"updated_at": now,
})
// Consider adding state validation
// 1. Verify the function exists and is in expected state
// 2. Check if it's already finished
// 3. Validate any preconditions before marking completeContext for Agents
[**BestPractice**]
**Finish method always returns success without validation**: The `Finish()` method updates the database but doesn't verify that the attached function exists or is in a valid state before updating. If `RowsAffected == 0`, it returns `ErrAttachedFunctionNotFound`, but there's no validation that the attached function is actually ready to be finished (e.g., not already deleted, not in an invalid state).
```go
// Current - minimal validation
result := s.db.Model(&dbmodel.AttachedFunction{}).
Where("id = ?", id).
Where("is_deleted = false").
UpdateColumns(map[string]interface{}{
"updated_at": now,
})
// Consider adding state validation
// 1. Verify the function exists and is in expected state
// 2. Check if it's already finished
// 3. Validate any preconditions before marking complete
```
File: go/pkg/sysdb/metastore/db/dao/task.go
Line: 1743417b01 to
74a9e62
Compare
|
Found 1 test failure on Blacksmith runners: Failure
|
| schedule.Nonce == testMinimalUUIDv7.String() && // Should use minimal UUID | ||
| schedule.NextScheduled != nil | ||
| })).Return(nil).Once() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
While you've correctly removed the mock for UpdateLowestLiveNonce, a related mock for heapClient.Push in this same test (TestAttachFunction_SuccessfulCreation_WithHeapService) appears to be incorrect after the refactoring. It expects a specific nonce value (testMinimalUUIDv7.String()), but the refactored AttachFunction no longer sets the Nonce field on the Schedule protobuf, causing it to default to "". This will likely cause the test to fail.
A similar issue exists in TestAttachFunction_RecoveryFlow_HeapFailureThenSuccess.
The mocks should be updated to expect an empty string for the nonce to align with the removal of the nonce logic.
Context for Agents
[**CriticalError**]
While you've correctly removed the mock for `UpdateLowestLiveNonce`, a related mock for `heapClient.Push` in this same test (`TestAttachFunction_SuccessfulCreation_WithHeapService`) appears to be incorrect after the refactoring. It expects a specific nonce value (`testMinimalUUIDv7.String()`), but the refactored `AttachFunction` no longer sets the `Nonce` field on the `Schedule` protobuf, causing it to default to `""`. This will likely cause the test to fail.
A similar issue exists in `TestAttachFunction_RecoveryFlow_HeapFailureThenSuccess`.
The mocks should be updated to expect an empty string for the nonce to align with the removal of the nonce logic.
File: go/pkg/sysdb/coordinator/create_task_test.go
Line: 264| // let cutoff_time = chrono::DateTime::<chrono::Utc>::from( | ||
| // attached_function_soft_delete_absolute_cutoff_time, | ||
| // ); | ||
|
|
||
| // self.prune_heap_across_shards(cutoff_time).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Since the prune_heap_across_shards function has been removed, this commented-out code block is now dead code and can be removed for better clarity.
Context for Agents
[**BestPractice**]
Since the `prune_heap_across_shards` function has been removed, this commented-out code block is now dead code and can be removed for better clarity.
File: rust/garbage_collector/src/garbage_collector_component.rs
Line: 492| // TODO: Uncomment after proto regeneration | ||
| suite.T().Skip("Test requires proto regeneration for AttachFunctionResponse.Id field") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[TestCoverage]
Critical: Test will always be skipped due to early return
Same issue as the previous test - the skip is placed after task creation, guaranteeing the actual cleanup/recreation logic never runs.
// Move skip to beginning or remove it
if needsProtoRegeneration {
suite.T().Skip("Test requires proto regeneration")
return
}Context for Agents
[**TestCoverage**]
**Critical: Test will always be skipped due to early return**
Same issue as the previous test - the skip is placed after task creation, guaranteeing the actual cleanup/recreation logic never runs.
```go
// Move skip to beginning or remove it
if needsProtoRegeneration {
suite.T().Skip("Test requires proto regeneration")
return
}
```
File: go/pkg/sysdb/coordinator/heap_client_integration_test.go
Line: 362
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?