Skip to content

Comments

Core: Support Hadoop bulk delete API.#15436

Draft
steveloughran wants to merge 1 commit intoapache:mainfrom
steveloughran:pr/12055-bulk-delete-2026
Draft

Core: Support Hadoop bulk delete API.#15436
steveloughran wants to merge 1 commit intoapache:mainfrom
steveloughran:pr/12055-bulk-delete-2026

Conversation

@steveloughran
Copy link
Contributor

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time.

Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes

This switch is on by default to help test through the spark versions and verify fallback.

In production it might be best if it not only off, but the code changed so if bulk delete wasn't available then there'd be no fallback, just an error "bulk delete requested but not available due to hadoop library too old".

  • Avoids any ambiguity about why it doesn't work.
  • Only of relevance for cloud connectors with the feature (currently: s3a)

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that
S3 object deletions can be done in pages of objects, rather
than one at a time.

* Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch
  to bulk deletes.
@steveloughran steveloughran marked this pull request as draft February 24, 2026 19:06
@steveloughran
Copy link
Contributor Author

There's something else to consider here. Do we need full reflection given the method is available at compile time? Instead, only use the operations if enabled, catch link failures and report better.

then there'd be spark tests where 4.0 and 4.1 verify the operation is there, 3.x expect failure when requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant