Skip to content

Conversation

@mehdi-aouadi
Copy link
Contributor

@mehdi-aouadi mehdi-aouadi commented Dec 11, 2025

PR Description

Replace our custom synchronized LRUCache implementation with CaffeineCache, a wrapper around the Caffeine cache library.

The Cache interface remains unchanged. The existing test suite has been updated to be compatible with Caffeine's eviction policy.

The existing LRUCache implementation presents a significant performance bottleneck under concurrent load due to its reliance on the synchronized keyword for all operations.

The primary issue is with the get method:

public synchronized V get(final K key, final Function<K, V> fallback) { ... }

If one thread experiences a cache miss, it acquires a lock on the entire cache. If the fallback function to compute the new value is slow , all other threads are blocked, even those trying to access completely different, already-cached keys. This leads to poor scalability and thread contention.

How does this PR address the issue?

  1. Caffeine uses sophisticated locking mechanisms. A cache miss and subsequent value computation on one key does not block other threads from reading or writing different keys. This eliminates the primary bottleneck of the old implementation

  2. Superior Eviction Policy (TinyLFU): While LRUCache uses a classic LRU policy, Caffeine employs a TinyLFU. This policy offers the same core benefit as LRU (evicting old, unused items) but is smarter, providing a better overall hit rate. TinyLFU combines recency (LRU) with frequency (LFU). It keeps track of not just when an item was last used, but also how often it's used

  3. Caffeine is a mature library known for achieving near optimal hit rates with minimal overhead

Fixed Issue(s)

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

Note

Replaces the custom synchronized LRU cache with a Caffeine-based implementation, wires it through state caches, adds JMH benchmarks, updates tests, and introduces a shared BeaconState cache container.

  • Infrastructure / Cache:
    • New CaffeineCache: Adds infrastructure/collections/.../CaffeineCache as the Cache impl and removes LRU-specific tests.
    • Tests: Adds CaffeineCacheTest and CacheTestUtil for deterministic testing.
  • Spec / State Caches:
    • Shared caches: Adds SharedBeaconStateCaches to hold global validatorsPubKeys and ValidatorIndexCache.
    • TransitionCaches: Switches to CaffeineCache, introduces CacheFactory, uses shared caches, and preserves copy() behavior.
    • ValidatorIndexCache: Uses CaffeineCache and adds clear().
  • Networking / Execution:
    • Replaces LRU with Caffeine in DefaultReputationManager, DataColumnSidecarSignatureValidator, SimpleSidecarRetriever, and ExecutionLayerChannelStub.
    • TestSpecFactory: clears shared caches on spec creation.
  • Benchmarks:
    • Adds JMH benchmarks CacheConcurrencyBenchmark and updates TransitionCachesBenchmark to compare Legacy LRU vs Caffeine.
    • Introduces benchmarks/gen/LegacyLRUCache for comparisons.
  • Build / Deps:
    • Adds com.github.ben-manes.caffeine:caffeine dependency in Gradle.

Written by Cursor Bugbot for commit 33c3b7c. This will update automatically on new commits. Configure here.

@mehdi-aouadi mehdi-aouadi force-pushed the caffeine-cache branch 3 times, most recently from c626d83 to b6051d7 Compare December 11, 2025 11:42
Copy link
Contributor

@zilm13 zilm13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

  1. Keep both
  2. Replace only in one-two places for the beginning
  3. Add jmh benchmark for some random multithread read/write access to see if it will show different numbers compared to LRUCache

@mehdi-aouadi
Copy link
Contributor Author

mehdi-aouadi commented Dec 11, 2025

Maybe

  1. Keep both
  2. Replace only in one-two places for the beginning
  3. Add jmh benchmark for some random multithread read/write access to see if it will show different numbers compared to LRUCache

I run some benchmarking and tried to make it realistic based on our current LRUCache usage (5:1 get_with_fallback:invalidate_with_new_value ratio):

Cache Benchmark Results (Separate Columns)

Benchmark Cache Size Cache Type Key Space Mode Cnt Score Error Units
CacheConcurrencyBenchmark.realWorldWorkload 1024 LEGACY_LRU 2048 thrpt 15 6455.029 108.554 ops/ms
CacheConcurrencyBenchmark.realWorldWorkload:getOperation 1024 LEGACY_LRU 2048 thrpt 15 922.952 77.950 ops/ms
CacheConcurrencyBenchmark.realWorldWorkload:invalidateOperation 1024 LEGACY_LRU 2048 thrpt 15 5532.077 163.020 ops/ms
CacheConcurrencyBenchmark.realWorldWorkload 1024 CAFFEINE 2048 thrpt 15 72097.235 7903.578 ops/ms
CacheConcurrencyBenchmark.realWorldWorkload:getOperation 1024 CAFFEINE 2048 thrpt 15 56454.641 6360.071 ops/ms
CacheConcurrencyBenchmark.realWorldWorkload:invalidateOperation 1024 CAFFEINE 2048 thrpt 15 15642.594 1565.776 ops/ms

TL;DR
CaffeineCache delivers over 11 times the throughput of LegacyLRUCache (72,097 vs. 6,455 ops/ms).
The error margin for Caffeine (~11%) is acceptable but higher than the legacy LRUCache (~1,7%).
IMO the error margin for Caffeine is perfectly acceptable and realistic for such high contention benchmark

@mehdi-aouadi
Copy link
Contributor Author

mehdi-aouadi commented Dec 16, 2025

I updated and run the CacheConcurrencyBenchmark and TransitionCachesBenchmark in a separate VM (m6a.2xlarge x64_86).

CacheConcurrencyBenchmark raw results

Benchmark Cache Size Cache Type Fallback Delay (ms) Key Space Mode Cnt Score Error Units
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks 1024 LEGACY_LRU 5 2048 thrpt 15 673.378 ±652.880 ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:concurrentReads 1024 LEGACY_LRU 5 2048 thrpt 15 672.203 ±653.139 ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:slowFallbacks 1024 LEGACY_LRU 5 2048 thrpt 15 1.175 ±0.742 ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks 1024 CAFFEINE 5 2048 thrpt 15 117,818.069 ±9,525.138 ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:concurrentReads 1024 CAFFEINE 5 2048 thrpt 15 117,817.824 ±9,525.138 ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:slowFallbacks 1024 CAFFEINE 5 2048 thrpt 15 0.246 ±0.005 ops/ms
CacheConcurrencyBenchmark.mixedReadWriteScenario 1024 LEGACY_LRU 5 2048 thrpt 15 6,063.919 ±192.653 ops/ms
CacheConcurrencyBenchmark.mixedReadWriteScenario 1024 CAFFEINE 5 2048 thrpt 15 77,068.284 ±1,298.065 ops/ms
CacheConcurrencyBenchmark.pureReadPerformance 1024 LEGACY_LRU 5 2048 thrpt 15 9,348.489 ±216.479 ops/ms
CacheConcurrencyBenchmark.pureReadPerformance 1024 CAFFEINE 5 2048 thrpt 15 154,061.989 ±11,562.494 ops/ms
CacheConcurrencyBenchmark.realWorldScenario 1024 LEGACY_LRU 5 2048 thrpt 15 8.412 ±0.536 ops/ms
CacheConcurrencyBenchmark.realWorldScenario 1024 CAFFEINE 5 2048 thrpt 15 94.171 ±2.572 ops/ms

CacheConcurrencyBenchmark processed results

Benchmark Scenario LegacyLRUCache CaffeineCache Performance Gain
Pure Read Hits 9,348 ops/ms 154,061 ops/ms ~16.5x
Mixed Read/Write 6,063 ops/ms 77,068 ops/ms ~12.7x
"Real World" (90% hits, some slow misses) 8.4 ops/ms 94.1 ops/ms ~11.2x
High Contention with Slow Fallback 673 ops/ms 117,818 ops/ms ~175x

TransitionCachesBenchmark raw results

Benchmark Cache Type Fallback Delay (ms) Mode Cnt Score Error Units
TransitionCachesBenchmark.contendedMissWithFallback CAFFEINE 0 thrpt 10 186,702,132.960 ±9,114,314.307 ops/s
TransitionCachesBenchmark.contendedMissWithFallback CAFFEINE 5 thrpt 10 200,019,960.009 ±9,128,102.734 ops/s
TransitionCachesBenchmark.contendedMissWithFallback LEGACY_LRU 0 thrpt 10 23,046,142.897 ±809,427.663 ops/s
TransitionCachesBenchmark.contendedMissWithFallback LEGACY_LRU 5 thrpt 10 27,536,678.636 ±5,894,598.243 ops/s
TransitionCachesBenchmark.copyCaches CAFFEINE 0 thrpt 10 8,954.644 ±136.269 ops/s
TransitionCachesBenchmark.copyCaches CAFFEINE 5 thrpt 10 8,533.452 ±59.198 ops/s
TransitionCachesBenchmark.copyCaches LEGACY_LRU 0 thrpt 10 12,645.276 ±292.941 ops/s
TransitionCachesBenchmark.copyCaches LEGACY_LRU 5 thrpt 10 12,630.207 ±733.149 ops/s
TransitionCachesBenchmark.realisticWorkload CAFFEINE 0 thrpt 10 23,293,859.582 ±1,663,813.239 ops/s
TransitionCachesBenchmark.realisticWorkload CAFFEINE 5 thrpt 10 13,550.253 ±1,637.875 ops/s
TransitionCachesBenchmark.realisticWorkload LEGACY_LRU 0 thrpt 10 6,385,206.352 ±446,515.793 ops/s
TransitionCachesBenchmark.realisticWorkload LEGACY_LRU 5 thrpt 10 426,367.649 ±858,306.047 ops/s

TransitionCachesBenchmark processed results

Benchmark Scenario LegacyLRUCache CaffeineCache Performance Gain
Realistic Workload (no delay) 6,385,206 ops/s 23,293,859 ops/s ~3.6x
Realistic Workload (5ms delay) 426,367 ops/s * 13,550 ops/s ** See Note
Contended Miss (Thundering Herd) 27,536,678 ops/s 200,019,960 ops/s ~7.2x
Copy Caches 12,630 ops/s 8,533 ops/s Legacy is ~1.5x faster

*Note on "Realistic Workload (5ms delay)":

  • The LegacyLRUCache score is statistically unreliable, with an error margin (±858k) far exceeding the score itself. This instability is caused by severe lock contention.
  • The CaffeineCache score is lower but predictable. Its performance degrades gracefully because a slow fallback on one key does not block reads for other cached keys.

*Note on "Copy Caches":
The legacy cache is faster at the non-critical copy() operation due to its simpler internal structure

@mehdi-aouadi mehdi-aouadi marked this pull request as draft December 19, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants