[ENH]: Refactor compactor into three chained orchestrators #5831

tanujnay112 · 2025-11-07T00:49:50Z

Description of changes

Summarize the changes made by this PR.

This change removes all function related code from the compaction path including the scheduler and the compaction orchestrator. This is done to make way for a refactor on the preexisting compaction orchestrator.

This refactor entails breaking the CompactOrchestrator into three chained orchestrators:

The DataFetchOrchestrator that does GetCollectionAndSegments -> FetchLog/SourceRecordSegments -> Partition -> Materialized Logs. Its main task is to source data.
The ApplyDataOrchestrator that takes in materialized log records from the previous orchestrator and applies them to segments via ApplyOperators, CommitOperators and Flush operators.
The RegisterOrchestrator that takes in flushed segment path from the previous step and invokes the Register operator.

Any common code across these three orchestrators has remained in compact.rs.

Improvements & Bug fixes
- ...
New functionality
- ...

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_

github-actions · 2025-11-07T00:50:05Z

tanujnay112 · 2025-11-07T00:50:06Z

[ENH]: Refactor compactor into three chained orchestrators #5831 : 2 dependent PRs (#5866 , #5867 ) 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

propel-code-bot · 2025-11-07T00:54:00Z

Split Monolithic Compactor into Three Dedicated Orchestrators

This PR performs a deep internal refactor of the compaction subsystem. The former single-class CompactOrchestrator and its ad-hoc scheduler have been decomposed into a linear chain of three specialised orchestrators—DataFetchOrchestrator, ApplyDataOrchestrator, and RegisterOrchestrator. Each stage now owns a narrowly-scoped responsibility: fetching input data, applying/committing it to segment writers, and registering the resulting segments. The compactor task lifecycle, supporting operators, and wiring in compactor::* and execution/orchestration::* have been updated accordingly.

The change is entirely behind internal service boundaries, but any code that interacted directly with the old orchestrator or scheduler now needs to target the stage-specific interfaces. The refactor increases maintainability, unlocks easier extension of compaction strategies, and prepares the worker service for upcoming background-work architecture changes.

Key Changes

• Introduced DataFetchOrchestrator, ApplyDataOrchestrator, and RegisterOrchestrator
• Removed legacy CompactOrchestrator, scheduler, and task plumbing
• Refactored operator modules to align with new stage boundaries
• Updated compactor task wiring in compactor/{compaction_manager,scheduler,tasks}.rs
• Adjusted Cargo deps and mod exports to surface new orchestrators
• Updated unit tests across log, sysdb, and worker crates

Affected Areas

• execution/orchestration/*
• compactor/* (manager, scheduler, tasks, mod)
• operator implementations (get_collection_and_segments, apply_log_to_segment_writer, register, prepare_task, finish_task)
• Cargo manifest & module exports
• unit-test suites touching compaction

This summary was automatically generated by @propel-code-bot

rust/worker/src/compactor/compaction_manager.rs

propel-code-bot · 2025-11-07T01:18:24Z

rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs

+            num_materialized_logs: 0,
+            segment_spans: HashMap::new(),
+            materialized_log_data,


[BestPractice]

Resource leak risk: HNSW index cleanup only happens in success case:

async fn try_purge_hnsw(path: &Path, hnsw_index_uuid: Option<IndexUuid>) { if let Some(hnsw_index_uuid) = hnsw_index_uuid { let _ = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await; } }

This cleanup method ignores all errors (let _ =). If purging fails due to file system errors or permissions, temporary HNSW indexes will accumulate on disk. Add error logging and potentially retry logic for cleanup failures.

Context for Agents

[**BestPractice**] Resource leak risk: HNSW index cleanup only happens in success case: ```rust async fn try_purge_hnsw(path: &Path, hnsw_index_uuid: Option<IndexUuid>) { if let Some(hnsw_index_uuid) = hnsw_index_uuid { let _ = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await; } } ``` This cleanup method ignores all errors (`let _ =`). If purging fails due to file system errors or permissions, temporary HNSW indexes will accumulate on disk. Add error logging and potentially retry logic for cleanup failures. File: rust/worker/src/execution/orchestration/apply_data_orchestrator.rs Line: 190

propel-code-bot · 2025-11-07T01:29:17Z

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs

+                self.context
+                    .orchestrator_context
+                    .task_cancellation_token
+                    .clone(),


[BestPractice]

Resource leak: The try_purge_hnsw function is called in cleanup methods but errors are silently ignored:

let _ = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await;

If the HNSW index cleanup fails, it could leave dangling resources on disk. Consider logging errors:

if let Err(e) = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await { tracing::warn!("Failed to purge HNSW index {}: {}", hnsw_index_uuid, e); }

Context for Agents

[**BestPractice**] Resource leak: The `try_purge_hnsw` function is called in cleanup methods but errors are silently ignored: ```rust let _ = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await; ``` If the HNSW index cleanup fails, it could leave dangling resources on disk. Consider logging errors: ```rust if let Err(e) = HnswIndexProvider::purge_one_id(path, hnsw_index_uuid).await { tracing::warn!("Failed to purge HNSW index {}: {}", hnsw_index_uuid, e); } ``` File: rust/worker/src/execution/orchestration/data_fetch_orchestrator.rs Line: 207

rust/worker/src/execution/orchestration/compact.rs

rust/worker/src/compactor/compaction_manager.rs

rust/worker/src/execution/orchestration/compact.rs

Sicheng-Pan · 2025-11-07T01:46:02Z

rust/worker/src/execution/orchestration/compact.rs

-                self.terminate_with_result(Err(e), ctx).await;
-                return;
+#[allow(clippy::too_many_arguments)]
+pub async fn compact(


nit: consider making this an Orchestrator with different stages for better code organization

or maybe consider CollectionCompactionContext::data_fetch,apply_data,...

rust/worker/src/execution/orchestration/compact.rs

rust/worker/src/execution/orchestration/apply_data_orchestrator.rs

rust/worker/src/execution/orchestration/compact.rs

propel-code-bot · 2025-11-07T09:30:55Z

rust/worker/src/execution/orchestration/compact.rs

+    pub fn get_segment_writer_by_id(
+        &self,


[BestPractice]

Schema field mutation without proper validation: collection_info.schema = apply_data_response.schema; directly assigns the schema without checking if the assignment conflicts with existing collection constraints. If apply_data_response.schema is None when the collection requires a schema, this could create an invalid state.

// Add validation: let updated_schema = apply_data_response.schema; if collection_info.collection.dimension.is_some() && updated_schema.is_none() { return Err(CompactionError::InvariantViolation( "Collection with dimension must have a schema" )); } collection_info.schema = updated_schema;

Context for Agents

[**BestPractice**] Schema field mutation without proper validation: `collection_info.schema = apply_data_response.schema;` directly assigns the schema without checking if the assignment conflicts with existing collection constraints. If `apply_data_response.schema` is `None` when the collection requires a schema, this could create an invalid state. ```rust // Add validation: let updated_schema = apply_data_response.schema; if collection_info.collection.dimension.is_some() && updated_schema.is_none() { return Err(CompactionError::InvariantViolation( "Collection with dimension must have a schema" )); } collection_info.schema = updated_schema; ``` File: rust/worker/src/execution/orchestration/compact.rs Line: 204

propel-code-bot · 2025-11-07T09:30:56Z

rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs

+                None => return,
+            }


[BestPractice]

Error handling gap: self.ok_or_terminate(segment_writer, ctx).await pattern doesn't handle the case where get_segment_writer_by_id() returns Ok(writer) but the writer is in an invalid state (e.g., already consumed/moved). This could lead to runtime panics when trying to use the writer.

// Add state validation: let segment_writer = match self.context.get_segment_writer_by_id(message.segment_id) { Ok(writer) => { // Validate writer is still usable if !writer.is_valid() { return self.terminate_with_result(Err(...), ctx).await; } writer }, Err(e) => { return self.terminate_with_result(Err(e.into()), ctx).await; } };

Context for Agents

[**BestPractice**] Error handling gap: `self.ok_or_terminate(segment_writer, ctx).await` pattern doesn't handle the case where `get_segment_writer_by_id()` returns `Ok(writer)` but the writer is in an invalid state (e.g., already consumed/moved). This could lead to runtime panics when trying to use the writer. ```rust // Add state validation: let segment_writer = match self.context.get_segment_writer_by_id(message.segment_id) { Ok(writer) => { // Validate writer is still usable if !writer.is_valid() { return self.terminate_with_result(Err(...), ctx).await; } writer }, Err(e) => { return self.terminate_with_result(Err(e.into()), ctx).await; } }; ``` File: rust/worker/src/execution/orchestration/apply_data_orchestrator.rs Line: 568

rust/worker/src/execution/orchestration/compact.rs

rust/worker/src/execution/orchestration/apply_data_orchestrator.rs

propel-code-bot · 2025-11-07T22:26:53Z

rust/worker/src/execution/operators/execute_task.rs

 impl ExecuteAttachedFunctionOperator {
    /// Create a new ExecuteAttachedFunctionOperator from an AttachedFunction.
    /// The executor is selected based on the function_id in the attached function.
+    #[allow(dead_code)]


[BestPractice]

The PR description mentions removing all function-related code. Since this function is now unused, it seems it should be removed completely instead of being marked with #[allow(dead_code)]. This would make the codebase cleaner and more aligned with the PR's goal.

Context for Agents

[**BestPractice**] The PR description mentions removing all function-related code. Since this function is now unused, it seems it should be removed completely instead of being marked with `#[allow(dead_code)]`. This would make the codebase cleaner and more aligned with the PR's goal. File: rust/worker/src/execution/operators/execute_task.rs Line: 87

rust/worker/src/execution/operators/materialize_logs.rs

propel-code-bot · 2025-11-08T23:24:22Z

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs

+                ctx,
+            )
+            .await;
+            return;


[BestPractice]

Potential integer overflow: The record count calculation could overflow with very large datasets:

collection_info.collection.total_records_post_compaction = output.len() as u64;

If output.len() exceeds u64::MAX, this will wrap around. Use checked arithmetic:

collection_info.collection.total_records_post_compaction = u64::try_from(output.len()) .map_err(|_| DataFetchOrchestratorError::InvariantViolation( "Record count exceeds u64 maximum" ))?;

Context for Agents

[**BestPractice**] Potential integer overflow: The record count calculation could overflow with very large datasets: ```rust collection_info.collection.total_records_post_compaction = output.len() as u64; ``` If `output.len()` exceeds `u64::MAX`, this will wrap around. Use checked arithmetic: ```rust collection_info.collection.total_records_post_compaction = u64::try_from(output.len()) .map_err(|_| DataFetchOrchestratorError::InvariantViolation( "Record count exceeds u64 maximum" ))?; ``` File: rust/worker/src/execution/orchestration/data_fetch_orchestrator.rs Line: 717

propel-code-bot · 2025-11-10T06:03:52Z

rust/worker/src/compactor/compaction_manager.rs

                Ok(ref compaction_response) => match compaction_response {
-                    CompactionResponse::Success { job_id } => {
-                        if job_id != &resp.job_id.0 {
+                    CompactionResponse::Success { job_id, .. } => {


[BestPractice]

The Success variant of CompactionResponse only has the job_id field. The .. is unnecessary and suggests there are other fields being ignored, which is not the case. Removing it makes the code clearer and more accurate.

Context for Agents

[**BestPractice**] The `Success` variant of `CompactionResponse` only has the `job_id` field. The `..` is unnecessary and suggests there are other fields being ignored, which is not the case. Removing it makes the code clearer and more accurate. File: rust/worker/src/compactor/compaction_manager.rs Line: 339

propel-code-bot · 2025-11-12T09:17:23Z

rust/worker/src/execution/operators/register.rs

+        };
+
+        let result = log
+            .update_collection_log_offset(&input.tenant, input.collection_id, input.log_position)


[BestPractice]

Potential race condition in log offset updates: The update_collection_log_offset method in InMemoryLog can fail (returns Result<(), Box<dyn ChromaError>>) but there's no atomic guarantee that the SysDB flush and log offset update happen together. If SysDB flush succeeds but log offset update fails, the system could be in an inconsistent state where SysDB thinks compaction completed but the log service doesn't.

// In register.rs, this could fail after SysDB flush succeeds: let result = log .update_collection_log_offset(&input.tenant, input.collection_id, input.log_position) .await;

Consider using a distributed transaction or compensation logic to handle partial failures.

Context for Agents

[**BestPractice**] Potential race condition in log offset updates: The `update_collection_log_offset` method in `InMemoryLog` can fail (returns `Result<(), Box<dyn ChromaError>>`) but there's no atomic guarantee that the SysDB flush and log offset update happen together. If SysDB flush succeeds but log offset update fails, the system could be in an inconsistent state where SysDB thinks compaction completed but the log service doesn't. ```rust // In register.rs, this could fail after SysDB flush succeeds: let result = log .update_collection_log_offset(&input.tenant, input.collection_id, input.log_position) .await; ``` Consider using a distributed transaction or compensation logic to handle partial failures. File: rust/worker/src/execution/operators/register.rs Line: 148

This partial failure is ok by our protocol! The sysdb is the source of truth. The log can be behind, in which case on a subsequent compaction, the log will be repaired and advanced to match the sysdb. The only implication of the log being behind the sysdb is we will unnecessarily launch a compaction since the rollup can't occur.

propel-code-bot · 2025-11-12T09:17:23Z

rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs

+            Some(outputs) => outputs,
+            None => {
+                self.terminate_with_result(
+                    Err(ApplyLogsOrchestratorError::InvariantViolation(


[BestPractice]

Missing error handling for empty materialized outputs: The code checks materialized_output.result.is_empty() and terminates with an invariant violation, but this could be a legitimate case when there are no logs to process. This will cause the orchestrator to fail unnecessarily.

// This should handle empty results gracefully, not as an error if materialized_output.result.is_empty() { self.terminate_with_result( Err(ApplyLogsOrchestratorError::InvariantViolation( "Attempting to apply an empty materialized output", )), ctx, ) .await; return Vec::new(); }

Consider handling empty results as a valid case and return early with success.

Context for Agents

[**BestPractice**] Missing error handling for empty materialized outputs: The code checks `materialized_output.result.is_empty()` and terminates with an invariant violation, but this could be a legitimate case when there are no logs to process. This will cause the orchestrator to fail unnecessarily. ```rust // This should handle empty results gracefully, not as an error if materialized_output.result.is_empty() { self.terminate_with_result( Err(ApplyLogsOrchestratorError::InvariantViolation( "Attempting to apply an empty materialized output", )), ctx, ) .await; return Vec::new(); } ``` Consider handling empty results as a valid case and return early with success. File: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs Line: 454

propel-code-bot · 2025-11-12T09:17:24Z

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs

+                let collection_info = match self.context.get_collection_info_mut() {
+                    Ok(info) => info,
+                    Err(err) => {
+                        tracing::info!("We're failing right here");


[BestPractice]

This tracing::info! call appears to be a leftover debug statement and should probably be removed.

Context for Agents

[**BestPractice**] This `tracing::info!` call appears to be a leftover debug statement and should probably be removed. File: rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs Line: 650

propel-code-bot · 2025-11-12T09:17:25Z

rust/worker/src/compactor/compaction_manager.rs

        while let Some(entry) = entries.next_entry().await.expect("Failed to read next dir") {
            let path = entry.path();
            let metadata = entry.metadata().await.expect("Failed to read metadata");
+            println!("Path: {}", path.display());


[BestPractice]

This appears to be a leftover debug print statement. It should probably be removed.

Context for Agents

[**BestPractice**] This appears to be a leftover debug print statement. It should probably be removed. File: rust/worker/src/compactor/compaction_manager.rs Line: 1062

propel-code-bot · 2025-11-12T10:30:36Z

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs

+                    &self.context.blockfile_provider,
+                ))
+                .await
+                {
+                    Ok(reader) => Ok(Some(reader)),
+                    Err(err) => match *err {
+                        RecordSegmentReaderCreationError::UninitializedSegment => Ok(None),
+                        _ => Err(*err),
+                    },
+                },
+                ctx,
+            )


[CriticalError]

Missing error handling for record reader creation failure:

let record_reader = match self .ok_or_terminate( match Box::pin(RecordSegmentReader::from_segment( &output.record_segment, &self.context.blockfile_provider, )) .await { Ok(reader) => Ok(Some(reader)), Err(err) => match *err { RecordSegmentReaderCreationError::UninitializedSegment => Ok(None), _ => Err(*err), }, }, ctx, )

The code dereferences *err which moves the boxed error, but then tries to use Err(*err) which attempts to move it again. This will cause a compilation error or runtime panic.

Fix:

Err(err) => Err(err),

Context for Agents

[**CriticalError**] Missing error handling for record reader creation failure: ```rust let record_reader = match self .ok_or_terminate( match Box::pin(RecordSegmentReader::from_segment( &output.record_segment, &self.context.blockfile_provider, )) .await { Ok(reader) => Ok(Some(reader)), Err(err) => match *err { RecordSegmentReaderCreationError::UninitializedSegment => Ok(None), _ => Err(*err), }, }, ctx, ) ``` The code dereferences `*err` which moves the boxed error, but then tries to use `Err(*err)` which attempts to move it again. This will cause a compilation error or runtime panic. Fix: ```rust Err(err) => Err(err), ``` File: rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs Line: 401

propel-code-bot · 2025-11-12T10:30:37Z

rust/log/src/in_memory_log.rs

+    pub fn set_fail_update_offset(&mut self, fail: bool) {
+        self.fail_update_offset = fail;


[BestPractice]

Race condition in test log offset update:

pub fn set_fail_update_offset(&mut self, fail: bool) { self.fail_update_offset = fail; } pub async fn update_collection_log_offset( &mut self, collection_id: CollectionUuid, new_offset: i64, ) -> Result<(), Box<dyn ChromaError>> { if self.fail_update_offset { return Err(Box::new(InMemoryLogError::UpdateOffsetFailed)); } self.offsets.insert(collection_id, new_offset); Ok(()) }

This test utility is not thread-safe. If multiple tests run concurrently and access the same InMemoryLog instance, the fail_update_offset flag and offsets HashMap could be corrupted. Use Arc<Mutex<>> or ensure single-threaded test execution.

Context for Agents

[**BestPractice**] Race condition in test log offset update: ```rust pub fn set_fail_update_offset(&mut self, fail: bool) { self.fail_update_offset = fail; } pub async fn update_collection_log_offset( &mut self, collection_id: CollectionUuid, new_offset: i64, ) -> Result<(), Box<dyn ChromaError>> { if self.fail_update_offset { return Err(Box::new(InMemoryLogError::UpdateOffsetFailed)); } self.offsets.insert(collection_id, new_offset); Ok(()) } ``` This test utility is not thread-safe. If multiple tests run concurrently and access the same `InMemoryLog` instance, the `fail_update_offset` flag and `offsets` HashMap could be corrupted. Use `Arc<Mutex<>>` or ensure single-threaded test execution. File: rust/log/src/in_memory_log.rs Line: 61

propel-code-bot · 2025-11-12T10:30:37Z

rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs

+            }
+        } else {
+            collection
+                .size_bytes_post_compaction
+                .saturating_add_signed(self.collection_logical_size_delta_bytes)
+        };
+
+        let flush_results = std::mem::take(&mut self.flush_results);
+        let total_records_post_compaction = collection.total_records_post_compaction;


[BestPractice]

Integer overflow risk in collection size calculation:

let collection_logical_size_bytes = if self.context.is_rebuild { match u64::try_from(self.collection_logical_size_delta_bytes) { Ok(size_bytes) => size_bytes, _ => { // error handling } } } else { collection .size_bytes_post_compaction .saturating_add_signed(self.collection_logical_size_delta_bytes) };

While saturating_add_signed prevents overflow, it silently caps at u64::MAX which could lead to incorrect size reporting. For a database system, this could cause storage quota miscalculations. Consider returning an error instead:

collection.size_bytes_post_compaction .checked_add_signed(self.collection_logical_size_delta_bytes) .ok_or(ApplyLogsOrchestratorError::InvariantViolation( "Collection size overflow detected" ))?

Context for Agents

[**BestPractice**] Integer overflow risk in collection size calculation: ```rust let collection_logical_size_bytes = if self.context.is_rebuild { match u64::try_from(self.collection_logical_size_delta_bytes) { Ok(size_bytes) => size_bytes, _ => { // error handling } } } else { collection .size_bytes_post_compaction .saturating_add_signed(self.collection_logical_size_delta_bytes) }; ``` While `saturating_add_signed` prevents overflow, it silently caps at `u64::MAX` which could lead to incorrect size reporting. For a database system, this could cause storage quota miscalculations. Consider returning an error instead: ```rust collection.size_bytes_post_compaction .checked_add_signed(self.collection_logical_size_delta_bytes) .ok_or(ApplyLogsOrchestratorError::InvariantViolation( "Collection size overflow detected" ))? ``` File: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs Line: 388

propel-code-bot · 2025-11-12T10:30:38Z

rust/sysdb/src/test_sysdb.rs

+            // Check for stale version (optimistic concurrency control)
+            if collection.version > collection_version {


[BestPractice]

Missing validation for stale collection version check:

// Check for stale version (optimistic concurrency control) if collection.version > collection_version { return Err(FlushCompactionError::FailedToFlushCompaction( tonic::Status::failed_precondition(format!( "Collection version is stale: expected {}, but collection is at version {}", collection_version, collection.version )), )); }

This only checks > but doesn't handle the case where collection.version < collection_version, which could indicate data corruption or a serious consistency issue. The check should be != for exact version matching:

if collection.version != collection_version { return Err(FlushCompactionError::FailedToFlushCompaction( tonic::Status::failed_precondition(format!( "Collection version mismatch: expected {}, but collection is at version {}", collection_version, collection.version )), )); }

Context for Agents

[**BestPractice**] Missing validation for stale collection version check: ```rust // Check for stale version (optimistic concurrency control) if collection.version > collection_version { return Err(FlushCompactionError::FailedToFlushCompaction( tonic::Status::failed_precondition(format!( "Collection version is stale: expected {}, but collection is at version {}", collection_version, collection.version )), )); } ``` This only checks `>` but doesn't handle the case where `collection.version < collection_version`, which could indicate data corruption or a serious consistency issue. The check should be `!=` for exact version matching: ```rust if collection.version != collection_version { return Err(FlushCompactionError::FailedToFlushCompaction( tonic::Status::failed_precondition(format!( "Collection version mismatch: expected {}, but collection is at version {}", collection_version, collection.version )), )); } ``` File: rust/sysdb/src/test_sysdb.rs Line: 480

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs

propel-code-bot · 2025-11-13T10:21:35Z

rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs

+                )
+                .await;
+                return Vec::new();
+            }
+        };
+
+        for materialized_output in materialized_outputs {


[CriticalError]

Resource Leak: Span Not Dropped on Early Termination

When tasks fail and terminate_with_result is called, spans stored in self.segment_spans are never removed:

let result = self.create_apply_log_to_segment_writer_tasks(/*...*/).await; let mut new_tasks = match result { Ok(tasks) => tasks, Err(err) => { self.terminate_with_result(Err(err.into()), ctx).await; // Early return return Vec::new(); // Spans in self.segment_spans never dropped } };

Spans remain in memory until the orchestrator is dropped, causing:

Memory leak for span data

Incorrect trace timing (spans appear active when work stopped)

Open telemetry connections held longer than necessary

Fix:

Err(err) => { self.segment_spans.clear(); // Drop all spans before terminating self.terminate_with_result(Err(err.into()), ctx).await; return Vec::new(); }

Context for Agents

[**CriticalError**] **Resource Leak: Span Not Dropped on Early Termination** When tasks fail and `terminate_with_result` is called, spans stored in `self.segment_spans` are never removed: ```rust let result = self.create_apply_log_to_segment_writer_tasks(/*...*/).await; let mut new_tasks = match result { Ok(tasks) => tasks, Err(err) => { self.terminate_with_result(Err(err.into()), ctx).await; // Early return return Vec::new(); // Spans in self.segment_spans never dropped } }; ``` Spans remain in memory until the orchestrator is dropped, causing: 1. Memory leak for span data 2. Incorrect trace timing (spans appear active when work stopped) 3. Open telemetry connections held longer than necessary **Fix:** ```rust Err(err) => { self.segment_spans.clear(); // Drop all spans before terminating self.terminate_with_result(Err(err.into()), ctx).await; return Vec::new(); } ``` File: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs Line: 464

propel-code-bot · 2025-11-14T08:55:55Z

rust/worker/src/execution/operators/register.rs

+        let result = sysdb
+            .flush_compaction(
+                input.tenant.clone(),
+                input.collection_id,
+                input.log_position,
+                input.collection_version,
+                input.segment_flush_info.clone(),
+                input.total_records_post_compaction,
+                input.collection_logical_size_bytes,
+                input.schema.clone(),
+            )
+            .await;



[BestPractice]

Idempotency violation: The flush_compaction call has no unique transaction/request ID to prevent duplicate execution:

let result = sysdb.flush_compaction( input.tenant.clone(), input.collection_id, input.log_position, input.collection_version, input.segment_flush_info.clone(), // ... no idempotency key ).await;

If the operator crashes after flush_compaction succeeds but before update_collection_log_offset, a retry will re-execute flush_compaction with the same parameters, potentially:

Incrementing counters twice

Creating duplicate segment records

Corrupting collection state

Fix: Add an idempotency key (e.g., job_id or request UUID) to flush_compaction to detect retries:

struct FlushCompactionRequest { idempotency_key: Uuid, // Deduplicate retries // ... existing fields }

Context for Agents

[**BestPractice**] **Idempotency violation**: The `flush_compaction` call has no unique transaction/request ID to prevent duplicate execution: ```rust let result = sysdb.flush_compaction( input.tenant.clone(), input.collection_id, input.log_position, input.collection_version, input.segment_flush_info.clone(), // ... no idempotency key ).await; ``` If the operator crashes after `flush_compaction` succeeds but before `update_collection_log_offset`, a retry will re-execute `flush_compaction` with the same parameters, potentially: - Incrementing counters twice - Creating duplicate segment records - Corrupting collection state **Fix**: Add an idempotency key (e.g., `job_id` or request UUID) to `flush_compaction` to detect retries: ```rust struct FlushCompactionRequest { idempotency_key: Uuid, // Deduplicate retries // ... existing fields } ``` File: rust/worker/src/execution/operators/register.rs Line: 138

tanujnay112 changed the title ~~[ENH]: Refactor compactor~~ [ENH]: Refactor compactor into three chained orchestrators Nov 7, 2025

tanujnay112 marked this pull request as ready for review November 7, 2025 00:50

tanujnay112 force-pushed the refactor_compactor branch from 2cbf88b to caaea81 Compare November 7, 2025 00:58

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/compactor/compaction_manager.rs Outdated Show resolved Hide resolved

tanujnay112 force-pushed the refactor_compactor branch from caaea81 to 1f6723b Compare November 7, 2025 01:12

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from 1f6723b to 2c77401 Compare November 7, 2025 01:22

tanujnay112 requested a review from Sicheng-Pan November 7, 2025 01:22

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Show resolved Hide resolved

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/compactor/compaction_manager.rs Show resolved Hide resolved

Sicheng-Pan reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Outdated Show resolved Hide resolved

Sicheng-Pan reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Outdated Show resolved Hide resolved

Sicheng-Pan reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Outdated Show resolved Hide resolved

Sicheng-Pan reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/apply_data_orchestrator.rs Outdated Show resolved Hide resolved

Sicheng-Pan reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Outdated Show resolved Hide resolved

tanujnay112 force-pushed the refactor_compactor branch from 2c77401 to d4c2383 Compare November 7, 2025 09:22

This comment has been minimized.

Sign in to view

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from d4c2383 to daee3e2 Compare November 7, 2025 11:46

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/compact.rs Show resolved Hide resolved

Sicheng-Pan approved these changes Nov 7, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from daee3e2 to 67e5b87 Compare November 7, 2025 22:20

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/orchestration/apply_data_orchestrator.rs Outdated Show resolved Hide resolved

propel-code-bot bot reviewed Nov 7, 2025

View reviewed changes

rust/worker/src/execution/operators/materialize_logs.rs Outdated Show resolved Hide resolved

tanujnay112 force-pushed the refactor_compactor branch from 67e5b87 to c31769e Compare November 8, 2025 00:38

This comment has been minimized.

Sign in to view

tanujnay112 force-pushed the refactor_compactor branch from c31769e to 1664dea Compare November 8, 2025 23:17

propel-code-bot bot reviewed Nov 8, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from 1664dea to fd04dfa Compare November 10, 2025 06:01

propel-code-bot bot reviewed Nov 10, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from fd04dfa to 4dcc473 Compare November 12, 2025 09:10

propel-code-bot bot reviewed Nov 12, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from 4dcc473 to c0937a6 Compare November 12, 2025 10:22

propel-code-bot bot reviewed Nov 12, 2025

View reviewed changes

tanujnay112 force-pushed the refactor_compactor branch from c0937a6 to 1605c9a Compare November 12, 2025 23:33

propel-code-bot bot reviewed Nov 12, 2025

View reviewed changes

rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs Show resolved Hide resolved

tanujnay112 force-pushed the refactor_compactor branch from 1605c9a to 906531e Compare November 13, 2025 10:12

propel-code-bot bot reviewed Nov 13, 2025

View reviewed changes

[ENH]: Refactor compactor

4104603

tanujnay112 force-pushed the refactor_compactor branch from 906531e to 4104603 Compare November 14, 2025 08:48

propel-code-bot bot reviewed Nov 14, 2025

View reviewed changes

This was referenced Nov 14, 2025

[CHORE]: Remove nonce-related code outside s3heap #5866

Open

[ENH]: Execute task with no backfill or incremental #5867

Draft

		pub fn set_fail_update_offset(&mut self, fail: bool) {
		self.fail_update_offset = fail;

		// Check for stale version (optimistic concurrency control)
		if collection.version > collection_version {

[ENH]: Refactor compactor into three chained orchestrators #5831

Are you sure you want to change the base?

[ENH]: Refactor compactor into three chained orchestrators #5831

Conversation

tanujnay112 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

github-actions bot commented Nov 7, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

tanujnay112 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

propel-code-bot bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sicheng-Pan Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

propel-code-bot bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

propel-code-bot bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

propel-code-bot bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

HammadB Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

tanujnay112 commented Nov 7, 2025 •

edited

Loading

tanujnay112 commented Nov 7, 2025 •

edited

Loading

propel-code-bot bot commented Nov 7, 2025 •

edited

Loading

HammadB Nov 12, 2025 •

edited

Loading