Open
Description
What happened?
When the KV store is under load, the ActionsSource
mechanism does not retry catalog.GetEntry()
or catalog.GetRepository()
calls.
CosmosDB's throughput was exceeded, causing throttling. In the original issue this led to failures when attempting to load pre-hooks, resulting in misleading 412 Precondition Failed
errors and the actions were not applied.
After addressing this issue, the correct 503 Service Unavailable
status code is now returned when such conditions occur.
Steps to Reproduce:
- Simulate heavy load on the KV store.
- Run a process that triggers actions.
Actual Result:
- Before the fix: Multiple
412
errors. - After the fix: Correctly returns
503
error.
Expected Behavior
Two approaches:
- The server should retry
catalog
calls when KV throttling errors occur. - Reduce the number of API calls to the KV store from the
List
andLoad
functions inactions_source.go
, by optimizing caching. Instead of usingrecord.SourceRef.String()
as the cache key, we can use thePhysicalAddress
field from theDBEntry
struc, effectively caching based on the ETag of the action file. Since the physical address and file changes less frequently than theSourceRef
, this would reduce redundant KV calls for the same file.
lakeFS version
1.59.0
How lakeFS is installed
No response
Affected clients
No response
Relevant log output
Contact details
No response