Skip to content

[Bug]: Handle requests to KV/Blockstore better during actions evaluation #9178

Open
@Annaseli

Description

@Annaseli

What happened?

When the KV store is under load, the ActionsSource mechanism does not retry catalog.GetEntry() or catalog.GetRepository() calls.
CosmosDB's throughput was exceeded, causing throttling. In the original issue this led to failures when attempting to load pre-hooks, resulting in misleading 412 Precondition Failed errors and the actions were not applied.

After addressing this issue, the correct 503 Service Unavailable status code is now returned when such conditions occur.

Steps to Reproduce:

  1. Simulate heavy load on the KV store.
  2. Run a process that triggers actions.

Actual Result:

  • Before the fix: Multiple 412 errors.
  • After the fix: Correctly returns 503 error.

Expected Behavior

Two approaches:

  1. The server should retry catalog calls when KV throttling errors occur.
  2. Reduce the number of API calls to the KV store from the List and Load functions in actions_source.go, by optimizing caching. Instead of using record.SourceRef.String() as the cache key, we can use the PhysicalAddress field from the DBEntry struc, effectively caching based on the ETag of the action file. Since the physical address and file changes less frequently than the SourceRef, this would reduce redundant KV calls for the same file.

lakeFS version

1.59.0

How lakeFS is installed

No response

Affected clients

No response

Relevant log output

Contact details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/hooksimprovements or additions to the hooks subsystembugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions