replace LRUCache with CaffeineCache #10225

mehdi-aouadi · 2025-12-11T11:30:34Z

PR Description

Replace our custom synchronized LRUCache implementation with CaffeineCache, a wrapper around the Caffeine cache library.

The Cache interface remains unchanged. The existing test suite has been updated to be compatible with Caffeine's eviction policy.

The existing LRUCache implementation presents a significant performance bottleneck under concurrent load due to its reliance on the synchronized keyword for all operations.

The primary issue is with the get method:

public synchronized V get(final K key, final Function<K, V> fallback) { ... }

If one thread experiences a cache miss, it acquires a lock on the entire cache. If the fallback function to compute the new value is slow , all other threads are blocked, even those trying to access completely different, already-cached keys. This leads to poor scalability and thread contention.

How does this PR address the issue?

Caffeine uses sophisticated locking mechanisms. A cache miss and subsequent value computation on one key does not block other threads from reading or writing different keys. This eliminates the primary bottleneck of the old implementation
Superior Eviction Policy (TinyLFU): While LRUCache uses a classic LRU policy, Caffeine employs a TinyLFU. This policy offers the same core benefit as LRU (evicting old, unused items) but is smarter, providing a better overall hit rate. TinyLFU combines recency (LRU) with frequency (LFU). It keeps track of not just when an item was last used, but also how often it's used
Caffeine is a mature library known for achieving near optimal hit rates with minimal overhead

Fixed Issue(s)

Documentation

I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

I thought about adding a changelog entry, and added one if I deemed necessary.

Note

Replaces the custom synchronized LRU cache with a Caffeine-based implementation, wires it through state caches, adds JMH benchmarks, updates tests, and introduces a shared BeaconState cache container.

Infrastructure / Cache:
- New CaffeineCache: Adds infrastructure/collections/.../CaffeineCache as the Cache impl and removes LRU-specific tests.
- Tests: Adds CaffeineCacheTest and CacheTestUtil for deterministic testing.
Spec / State Caches:
- Shared caches: Adds SharedBeaconStateCaches to hold global validatorsPubKeys and ValidatorIndexCache.
- TransitionCaches: Switches to CaffeineCache, introduces CacheFactory, uses shared caches, and preserves copy() behavior.
- ValidatorIndexCache: Uses CaffeineCache and adds clear().
Networking / Execution:
- Replaces LRU with Caffeine in DefaultReputationManager, DataColumnSidecarSignatureValidator, SimpleSidecarRetriever, and ExecutionLayerChannelStub.
- TestSpecFactory: clears shared caches on spec creation.
Benchmarks:
- Adds JMH benchmarks CacheConcurrencyBenchmark and updates TransitionCachesBenchmark to compare Legacy LRU vs Caffeine.
- Introduces benchmarks/gen/LegacyLRUCache for comparisons.
Build / Deps:
- Adds com.github.ben-manes.caffeine:caffeine dependency in Gradle.

^{Written by Cursor Bugbot for commit 33c3b7c. This will update automatically on new commits. Configure here.}

zilm13

Maybe

Keep both
Replace only in one-two places for the beginning
Add jmh benchmark for some random multithread read/write access to see if it will show different numbers compared to LRUCache

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java

...ions/src/test/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCacheTest.java

mehdi-aouadi · 2025-12-11T18:50:44Z

Maybe

Keep both

Replace only in one-two places for the beginning

Add jmh benchmark for some random multithread read/write access to see if it will show different numbers compared to LRUCache

I run some benchmarking and tried to make it realistic based on our current LRUCache usage (5:1 get_with_fallback:invalidate_with_new_value ratio):

Cache Benchmark Results (Separate Columns)

Benchmark	Cache Size	Cache Type	Key Space	Mode	Cnt	Score	Error	Units
`CacheConcurrencyBenchmark.realWorldWorkload`	1024	LEGACY_LRU	2048	thrpt	15	6455.029	108.554	ops/ms
`CacheConcurrencyBenchmark.realWorldWorkload:getOperation`	1024	LEGACY_LRU	2048	thrpt	15	922.952	77.950	ops/ms
`CacheConcurrencyBenchmark.realWorldWorkload:invalidateOperation`	1024	LEGACY_LRU	2048	thrpt	15	5532.077	163.020	ops/ms
`CacheConcurrencyBenchmark.realWorldWorkload`	1024	CAFFEINE	2048	thrpt	15	72097.235	7903.578	ops/ms
`CacheConcurrencyBenchmark.realWorldWorkload:getOperation`	1024	CAFFEINE	2048	thrpt	15	56454.641	6360.071	ops/ms
`CacheConcurrencyBenchmark.realWorldWorkload:invalidateOperation`	1024	CAFFEINE	2048	thrpt	15	15642.594	1565.776	ops/ms

TL;DR
CaffeineCache delivers over 11 times the throughput of LegacyLRUCache (72,097 vs. 6,455 ops/ms).
The error margin for Caffeine (~11%) is acceptable but higher than the legacy LRUCache (~1,7%).
IMO the error margin for Caffeine is perfectly acceptable and realistic for such high contention benchmark

eth-benchmark-tests/src/jmh/java/tech/pegasys/teku/benchmarks/CacheConcurrencyBenchmark.java

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java

...ions/src/test/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCacheTest.java

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java

mehdi-aouadi · 2025-12-16T15:59:49Z

I updated and run the CacheConcurrencyBenchmark and TransitionCachesBenchmark in a separate VM (m6a.2xlarge x64_86).

`CacheConcurrencyBenchmark` raw results

Benchmark	Cache Size	Cache Type	Fallback Delay (ms)	Key Space	Mode	Cnt	Score	Error	Units
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks	1024	LEGACY_LRU	5	2048	thrpt	15	673.378	±652.880	ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:concurrentReads	1024	LEGACY_LRU	5	2048	thrpt	15	672.203	±653.139	ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:slowFallbacks	1024	LEGACY_LRU	5	2048	thrpt	15	1.175	±0.742	ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks	1024	CAFFEINE	5	2048	thrpt	15	117,818.069	±9,525.138	ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:concurrentReads	1024	CAFFEINE	5	2048	thrpt	15	117,817.824	±9,525.138	ops/ms
CacheConcurrencyBenchmark.concurrentReadsWithSlowFallbacks:slowFallbacks	1024	CAFFEINE	5	2048	thrpt	15	0.246	±0.005	ops/ms
CacheConcurrencyBenchmark.mixedReadWriteScenario	1024	LEGACY_LRU	5	2048	thrpt	15	6,063.919	±192.653	ops/ms
CacheConcurrencyBenchmark.mixedReadWriteScenario	1024	CAFFEINE	5	2048	thrpt	15	77,068.284	±1,298.065	ops/ms
CacheConcurrencyBenchmark.pureReadPerformance	1024	LEGACY_LRU	5	2048	thrpt	15	9,348.489	±216.479	ops/ms
CacheConcurrencyBenchmark.pureReadPerformance	1024	CAFFEINE	5	2048	thrpt	15	154,061.989	±11,562.494	ops/ms
CacheConcurrencyBenchmark.realWorldScenario	1024	LEGACY_LRU	5	2048	thrpt	15	8.412	±0.536	ops/ms
CacheConcurrencyBenchmark.realWorldScenario	1024	CAFFEINE	5	2048	thrpt	15	94.171	±2.572	ops/ms

`CacheConcurrencyBenchmark` processed results

Benchmark Scenario	LegacyLRUCache	CaffeineCache	Performance Gain
Pure Read Hits	9,348 ops/ms	154,061 ops/ms	~16.5x
Mixed Read/Write	6,063 ops/ms	77,068 ops/ms	~12.7x
"Real World" (90% hits, some slow misses)	8.4 ops/ms	94.1 ops/ms	~11.2x
High Contention with Slow Fallback	673 ops/ms	117,818 ops/ms	~175x

`TransitionCachesBenchmark` raw results

Benchmark	Cache Type	Fallback Delay (ms)	Mode	Cnt	Score	Error	Units
TransitionCachesBenchmark.contendedMissWithFallback	CAFFEINE	0	thrpt	10	186,702,132.960	±9,114,314.307	ops/s
TransitionCachesBenchmark.contendedMissWithFallback	CAFFEINE	5	thrpt	10	200,019,960.009	±9,128,102.734	ops/s
TransitionCachesBenchmark.contendedMissWithFallback	LEGACY_LRU	0	thrpt	10	23,046,142.897	±809,427.663	ops/s
TransitionCachesBenchmark.contendedMissWithFallback	LEGACY_LRU	5	thrpt	10	27,536,678.636	±5,894,598.243	ops/s
TransitionCachesBenchmark.copyCaches	CAFFEINE	0	thrpt	10	8,954.644	±136.269	ops/s
TransitionCachesBenchmark.copyCaches	CAFFEINE	5	thrpt	10	8,533.452	±59.198	ops/s
TransitionCachesBenchmark.copyCaches	LEGACY_LRU	0	thrpt	10	12,645.276	±292.941	ops/s
TransitionCachesBenchmark.copyCaches	LEGACY_LRU	5	thrpt	10	12,630.207	±733.149	ops/s
TransitionCachesBenchmark.realisticWorkload	CAFFEINE	0	thrpt	10	23,293,859.582	±1,663,813.239	ops/s
TransitionCachesBenchmark.realisticWorkload	CAFFEINE	5	thrpt	10	13,550.253	±1,637.875	ops/s
TransitionCachesBenchmark.realisticWorkload	LEGACY_LRU	0	thrpt	10	6,385,206.352	±446,515.793	ops/s
TransitionCachesBenchmark.realisticWorkload	LEGACY_LRU	5	thrpt	10	426,367.649	±858,306.047	ops/s

`TransitionCachesBenchmark` processed results

Benchmark Scenario	LegacyLRUCache	CaffeineCache	Performance Gain
Realistic Workload (no delay)	6,385,206 ops/s	23,293,859 ops/s	~3.6x
Realistic Workload (5ms delay)	426,367 ops/s *	13,550 ops/s **	See Note
Contended Miss (Thundering Herd)	27,536,678 ops/s	200,019,960 ops/s	~7.2x
Copy Caches	12,630 ops/s	8,533 ops/s	Legacy is ~1.5x faster

*Note on "Realistic Workload (5ms delay)":

The LegacyLRUCache score is statistically unreliable, with an error margin (±858k) far exceeding the score itself. This instability is caused by severe lock contention.
The CaffeineCache score is lower but predictable. Its performance degrades gracefully because a slow fallback on one key does not block reads for other cached keys.

*Note on "Copy Caches":
The legacy cache is faster at the non-critical copy() operation due to its simpler internal structure

eth-benchmark-tests/src/jmh/java/tech/pegasys/teku/benchmarks/TransitionCachesBenchmark.java

mehdi-aouadi force-pushed the caffeine-cache branch 3 times, most recently from c626d83 to b6051d7 Compare December 11, 2025 11:42

zilm13 reviewed Dec 11, 2025

View reviewed changes

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java Outdated Show resolved Hide resolved

...ions/src/test/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCacheTest.java Show resolved Hide resolved

ahamlat reviewed Dec 12, 2025

View reviewed changes

eth-benchmark-tests/src/jmh/java/tech/pegasys/teku/benchmarks/CacheConcurrencyBenchmark.java Show resolved Hide resolved

mehdi-aouadi added 4 commits December 15, 2025 22:01

replace LRUCache with CaffeineCache

5c1df82

update copy logic to handle unlimited caches

c243220

add caffeine cache benchmark

94e3905

updates benchmarks

bfb450e

mehdi-aouadi force-pushed the caffeine-cache branch from 038a629 to bfb450e Compare December 15, 2025 21:10

mehdi-aouadi self-assigned this Dec 15, 2025

cursor bot reviewed Dec 15, 2025

View reviewed changes

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java Show resolved Hide resolved

refactor jmh

da019c2

cursor bot reviewed Dec 16, 2025

View reviewed changes

...ions/src/test/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCacheTest.java Outdated Show resolved Hide resolved

mehdi-aouadi added 2 commits December 16, 2025 15:33

refactoring

a9a4d2c

update benchmarks

9d21fb1

cursor bot reviewed Dec 16, 2025

View reviewed changes

...lections/src/main/java/tech/pegasys/teku/infrastructure/collections/cache/CaffeineCache.java Show resolved Hide resolved

mehdi-aouadi added 3 commits December 16, 2025 19:14

Merge branch 'master' into caffeine-cache

90cae81

share validator index and public keys caches

6ef442b

Merge branch 'master' into caffeine-cache

db8b918

cursor bot reviewed Dec 18, 2025

View reviewed changes

eth-benchmark-tests/src/jmh/java/tech/pegasys/teku/benchmarks/TransitionCachesBenchmark.java Show resolved Hide resolved

clear caches for tests

33c3b7c

mehdi-aouadi marked this pull request as draft December 19, 2025 15:36

clear caches for tests

36c7141

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace LRUCache with CaffeineCache #10225

replace LRUCache with CaffeineCache #10225

mehdi-aouadi commented Dec 11, 2025 •

edited by cursor bot

Loading

Uh oh!

zilm13 left a comment

Uh oh!

Uh oh!

Uh oh!

mehdi-aouadi commented Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mehdi-aouadi commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

replace LRUCache with CaffeineCache #10225

Are you sure you want to change the base?

replace LRUCache with CaffeineCache #10225

Conversation

mehdi-aouadi commented Dec 11, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

How does this PR address the issue?

Fixed Issue(s)

Documentation

Changelog

Uh oh!

zilm13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mehdi-aouadi commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache Benchmark Results (Separate Columns)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mehdi-aouadi commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CacheConcurrencyBenchmark raw results

CacheConcurrencyBenchmark processed results

TransitionCachesBenchmark raw results

TransitionCachesBenchmark processed results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mehdi-aouadi commented Dec 11, 2025 •

edited by cursor bot

Loading

mehdi-aouadi commented Dec 11, 2025 •

edited

Loading

mehdi-aouadi commented Dec 16, 2025 •

edited

Loading

`CacheConcurrencyBenchmark` raw results

`CacheConcurrencyBenchmark` processed results

`TransitionCachesBenchmark` raw results

`TransitionCachesBenchmark` processed results