Skip to content

Conversation

@rescrv
Copy link
Contributor

@rescrv rescrv commented Nov 12, 2025

Description of changes

We want the ability to run a task for the first time on a collection.

Backfills coalesce.

Test plan

New tests added, so CI.

Migration plan

Backwards-compatible serialization types due to proto.

Observability plan

N/A

Documentation Changes

N/A

We want the ability to run a task for the first time on a collection.

Backfills coalesce.
@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Nov 12, 2025

Add backfill flag to DirtyMarker protocol & service logic

This PR introduces a backfill boolean to the DirtyMarker message used by the log-streaming pipeline. The flag lets the system explicitly tag the first-run or historical back-fill of a task for an existing collection. Service-side logic now auto-emits and coalesces these markers to avoid log spam, while the Rust type layer and protobuf schema were expanded in a fully backward-compatible manner.

Key Changes

• Added bool backfill to idl/chromadb/proto/logservice.proto (field appended, retaining wire compatibility)
• Extended Rust DirtyMarker struct plus (de)serialization in rust/types and rust/log
• Log-service now sets backfill=true on first-run markers and coalesces repeated ones
• Epoch calculation uses saturating_sub(backfill_timeout) for correctness
• New/updated integration tests covering backfill behaviour

Affected Areas

• Protobuf contract (DirtyMarker message)
• Rust type definitions and gRPC conversion code
• Log-service marker generation & coalescing logic
• Integration test suite

This summary was automatically generated by @propel-code-bot

Comment on lines +70 to +72
// If the request was the result of the backfill call---operator's intention left to the spurr of
// the moment.
bool backfill = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

This comment is a bit unclear and informal. A more descriptive comment would improve clarity for future developers. Also, there's a typo in "spurr".

Context for Agents
[**Documentation**]

This comment is a bit unclear and informal. A more descriptive comment would improve clarity for future developers. Also, there's a typo in "spurr".

File: idl/chromadb/proto/logservice.proto
Line: 72

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want it murky so we can always repurpose backfill once we do this one.

@blacksmith-sh

This comment has been minimized.


message BackfillRequest {
string collection_id = 1;
uint64 initial_insertion_epoch_us = 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backfill will only trigger at this point.

@rescrv rescrv requested a review from tanujnay112 November 12, 2025 17:31
.map_err(|_| Status::invalid_argument("Failed to parse collection id"))?;
tracing::info!(
"backfill for {collection_id} at {}",
request.initial_insertion_epoch_us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this initial_insertion_epoch_us should be chosen to be some time in the past over here instead of being passed in. The RLS would know better about how far in the past to go than the caller.

@blacksmith-sh

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants