Skip to content

Disk buffering iterator explicit removal config option#2560

Merged
jaydeluca merged 16 commits intoopen-telemetry:mainfrom
LikeTheSalad:disk-buffering-iterator-changes
Feb 26, 2026
Merged

Disk buffering iterator explicit removal config option#2560
jaydeluca merged 16 commits intoopen-telemetry:mainfrom
LikeTheSalad:disk-buffering-iterator-changes

Conversation

@LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented Jan 16, 2026

Addresses #2540 by making item deletion during iteration configurable.

A new deleteItemsOnIteration option has been added to FileStorageConfiguration, defaulting to true to preserve backward compatibility. When set to false, items are no longer implicitly removed from disk during iteration, so users must call Iterator.remove() explicitly to delete items in that scenario.

Example usage with explicit deletion:

FileStorageConfiguration config = FileStorageConfiguration.builder()
    .setDeleteItemsOnIteration(false)
    .build();
SignalStorage.Span spanStorage = FileSpanStorage.create(spansDir, config);

Iterator<Collection<SpanData>> spanIterator = spanStorage.iterator();
while (spanIterator.hasNext()) {
  Collection<SpanData> spans = spanIterator.next();
  // ... export spans ...
  spanIterator.remove(); // Items stay on disk unless this is called when "deleteItemsOnIteration" is set to "false".
}

@github-actions github-actions bot requested a review from zeitlinger January 16, 2026 17:11
@LikeTheSalad LikeTheSalad marked this pull request as ready for review January 16, 2026 17:22
@LikeTheSalad LikeTheSalad requested a review from a team as a code owner January 16, 2026 17:22
Copy link
Member

@jaydeluca jaydeluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont have a lot of experience with how this module is used in the real world, but do we have concerns about overflowing disks if someone doesn't know this change is happening? just trying to think of how to get ahead of that

…fering/internal/storage/FolderManager.java

Co-authored-by: Jay DeLuca <jaydeluca4@gmail.com>
Copilot AI review requested due to automatic review settings February 25, 2026 14:16
LikeTheSalad and others added 2 commits February 25, 2026 15:20
…fering/internal/storage/FolderManager.java

Co-authored-by: Jay DeLuca <jaydeluca4@gmail.com>
…fering/internal/storage/Storage.java

Co-authored-by: Jay DeLuca <jaydeluca4@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes disk-buffering iteration semantics so that reading no longer implicitly removes items/files from disk; instead, consumers must explicitly delete via Iterator.remove() (or ReadableResult.delete()), aligning with standard Iterator expectations and addressing issue #2540.

Changes:

  • Stop implicitly deleting stored batches/files during iteration; require explicit deletion (Iterator.remove()).
  • Add readable-file selection filtering to skip already-processed/invalid files and improve corrupted-data handling.
  • Update tests and README examples to reflect the new explicit-deletion behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/StorageIterator.java Removes implicit delete-on-advance; keeps deletion behind Iterator.remove().
disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/Storage.java Introduces exclusion predicate to move past finished/corrupt files without deleting them implicitly.
disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/FolderManager.java Adds cache-file parsing/selection with predicate-based exclusion and validation of file names.
disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java Stops deleting the underlying file automatically at EOF; adds created-time tracking.
disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/reader/DelimitedProtoStreamReader.java Treats partial reads as IO corruption by throwing IOException instead of returning null.
disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFileTest.java Updates expectations: empty file is no longer deleted merely by reading.
disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/StorageTest.java Updates expectations: storage directory retains file when items aren’t explicitly deleted.
disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/FolderManagerTest.java Refactors to new getReadableFile(Predicate) API and adds custom-filter coverage.
disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/IntegrationTest.java Adds explicit-removal integration scenario; asserts on-disk retention vs removal.
disk-buffering/README.md Updates usage example and documents explicit deletion options.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

@LikeTheSalad
Copy link
Contributor Author

i dont have a lot of experience with how this module is used in the real world, but do we have concerns about overflowing disks if someone doesn't know this change is happening? just trying to think of how to get ahead of that

A disk overflow shouldn't happen because there's a safeguard mechanism that checks the max folder size and max file age for read config options and purges old data if needed. However, I do understand the concern about introducing a silent behavior change. After giving it some thought, I decided to keep the existing behavior as the default and instead allow for an "explicit removal" feature as a configurable, opt-in option, as it seems to me that the existing behavior of automatically deleting items suits most cases; otherwise, we would've gotten more people complaining about it by now.

I've updated the readme to better explain all the options that users have after these changes.

@LikeTheSalad LikeTheSalad requested a review from Copilot February 25, 2026 16:25
@LikeTheSalad LikeTheSalad changed the title Disk buffering iterator explicit removal Disk buffering iterator explicit removal config option Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

Copy link
Member

@jaydeluca jaydeluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! will just wait for one more set of eyes cc @zeitlinger @breedx-splk @bencehornak

```

Note that even with explicit deletion, disk usage is still bounded by the configured max folder size and max file
age, so stale files are automatically purged when there's not enough space available before new data is written.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@bencehornak bencehornak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, LGTM!

fileExclusion = file -> file.getCreatedTimeMillis() <= currentFileCreatedTime;
readableFile.close();
readableFileRef.set(null);
return doReadNext(deserializer, ++attemptNumber);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related, but loops are better than recursion as they don't pollute the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been ages since I wrote this part. I honestly don't remember why I chose recursion, but I think it's because on a happy path, it should only be executed once, so the recursion would only be present on exceptions, hence an iteration was not the expected, common behavior. I think I understand what you're saying, but so far it doesn't seem to cause an issue in practice, and it also doesn't seem to be a trivial change, so it wouldn't fall under my current set of priorities. Still, if you're interested in finding an alternative solution, feel free to open a PR, and I'll take a look 👍

@jaydeluca jaydeluca added this pull request to the merge queue Feb 26, 2026
Merged via the queue into open-telemetry:main with commit 4b01654 Feb 26, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants