[3.0 Cherry Pick] [#3423] Fix unnecessary DynamoDB GET calls during LogStore::listFrom VACUUM calls #3461
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick 03bdf84 to branch 3.0
Which Delta project/connector is this regarding?
Description
Resolves #3423.
This PR updates the logic in
BaseExternalLogStore::listFrom
so that it does not make a request to get the latest entry from the external store (which is used to perform recovery operations) in the event that a non_delta_log
file is being listed.This is useful for VACUUM operations which may do hundreds or thousands of list calls in the table directory and nested partition directories of parquet files. This is NOT the
_delta_log
. Thus, checking the external store during these list calls is (1) useless and unwanted as we are not listing the_delta_log
so clearly now isn't the time to attempt to do a fixup, and (2) expensive.This PR makes it so that future VACUUM operations do not perform unnecessary calls to the external store (e.g. DyanamoDB).
How was this patch tested?
Unit tests and an integration test that actually runs VACUUM and compares the number of external store calls using the old/new logic. I ran that test myself 50 times, too, and it passed every time (therefore, not flaky).
Does this PR introduce any user-facing changes?
No