HBASE-29585 Add row-level cache for the get operation #7291

EungsopYoo · 2025-09-10T23:17:27Z

No description provided.

wchevreuil

This is a great idea, thanks for sharing it. I do have some comments, though:

Can the RowCacheService be an implementation of BlockCache? Maybe a wrapper to the LRUBlockCache. I'm a bit worried about introducing a whole new layer for intercepting all read/write operations at the RPC service with cache specific logic, however this class is not the cache implementation itself. Seems a bit confusing to have a complete separate entry point to the cache.
Are we accepting to have same row data in multiple cache? In the current code, I haven't see any checks to avoid that. Maybe if we implement RowCacheService as a block cache implementation, so that the cache operations happen from the inner layers of the read/write operations, it would be easier to avoid duplication.
Why not simply evict the row that got mutated? I guess we cannot simply override it in the cache because mutation can happen on individual cells.
Are we accepting to have data duplicated over separate caches? I don't see any logic to avoid caching a whole block containing a region for a Get in the L2 cache, still we'll be cache the row in the row cache. Similarly, we might re-cache a row that's in the memstore in the row cache.
One problem of adding such small units (a single row) in the cache is that we need to keep a map index for each entry. So, the smaller the row in size, more rows would fit in the cache, but more key objects would be retained in the map. In your tests, assuming the default block cache size of 40% of the heap, it would give a 12.8GB of block cache. Have you managed to measure the block cache usage by the row cache, in terms of number of rows in the cache, byte size of the L1 cache and the total heap usage? Maybe wort collecting a heapdump to analyse the map index size in the heap.

wchevreuil · 2025-09-11T11:26:38Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

+
+    RegionScannerImpl scanner = getScannerInternal(region, scan, results);
+
+    // The row cache is ineffective when the number of store files is small. If the number


Can you elaborate more on this? Is it really a matter of number of files or total store file size? For a single CF table, where a given region, after major compaction, has a 10GB store file, wouldn't this be more efficient?

Get performance is more affected by the number of StoreFiles than by their size. This is because a StoreFileScanner must be created for each StoreFile, and the process of aggregating their results into a single Get result becomes increasingly complex. However, in testing, I found that when there was only one StoreFile, the row cache provided almost no performance benefit. Therefore, I added this condition to prevent the row cache from unnecessarily occupying BlockCache space.

Ahh, right, so the main gain here comes from avoiding the merge of results from different store file scanners. I guess, there could be still benefits on doing this row caching for gets only, even when only having one store file. Say, L2 cache is at capacity already, long client scans could cause evictions for blocks of gets for repeating keys.

Yes, I agree. I’ll remove the condition to cache only when the number of StoreFiles is above a threshold, and always cache the row.

Fixed in e33db29.

wchevreuil · 2025-09-11T13:39:57Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

+
+  private boolean tryGetFromCache(HRegion region, RowCacheKey key, Get get, List<Cell> results) {
+    RowCells row =
+      (RowCells) region.getBlockCache().getBlock(key, get.getCacheBlocks(), false, true);


RowCacheKey uses the region encoded name for indexing, whilst BlockCacheKey uses (store file name + offset). If the given row is already cached in a L2 cache block, this call will fail to fetch it and we'll cache it on the L1 too.

I initially intended for the row cache to reside only in L1 and not be cached in L2, but I haven’t actually implemented that yet. I’ll give further thought to adding this.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

EungsopYoo · 2025-09-12T00:22:25Z

This is a great idea, thanks for sharing it. I do have some comments, though:

Thank you for starting the PR review.

Can the RowCacheService be an implementation of BlockCache? Maybe a wrapper to the LRUBlockCache. I'm a bit worried about introducing a whole new layer for intercepting all read/write operations at the RPC service with cache specific logic, however this class is not the cache implementation itself. Seems a bit confusing to have a complete separate entry point to the cache.

BlockCache operates at the HFile access layer, whereas the row cache needs to function at a higher layer that covers both MemStore and HFile. That’s why I implemented RowCacheService in the RPC service layer.

That said, the row cache does not actually cache HFileBlocks, yet it currently relies on the BlockCache interface. I realize this might not be appropriate. I reused the BlockCache interface to reduce the overhead of creating a separate cache implementation solely for the row cache, but in hindsight, this might not have been the best approach. It may be better to build a dedicated cache implementation specifically for the row cache.

What do you think?

Are we accepting to have same row data in multiple cache? In the current code, I haven't see any checks to avoid that. Maybe if we implement RowCacheService as a block cache implementation, so that the cache operations happen from the inner layers of the read/write operations, it would be easier to avoid duplication.

What exactly does “multiple cache” refer to? Does it mean the L1 and L2 caches in the CombinedBlockCache? If so, I haven’t really considered that aspect yet, but I’ll start looking into it.

Why not simply evict the row that got mutated? I guess we cannot simply override it in the cache because mutation can happen on individual cells.

I didn’t fully understand the intention behind your question. Could you please explain it in more detail?

Are we accepting to have data duplicated over separate caches? I don't see any logic to avoid caching a whole block containing a region for a Get in the L2 cache, still we'll be cache the row in the row cache. Similarly, we might re-cache a row that's in the memstore in the row cache.

This is in the same L1/L2 context as your comment 2, correct? If so, I haven’t considered that aspect yet, but I’ll start thinking about how to handle it.

Since the row cache is only enabled when there are at least two HFiles, rows that exist only in the MemStore are not cached. However, when there are two or more HFiles, rows in MemStore are also added again to the row cache. This is an intentional design choice, aimed at bypassing the process of generating results via SegmentScanner and StoreFileScanner, and instead serving Get requests directly from the cache.

One problem of adding such small units (a single row) in the cache is that we need to keep a map index for each entry. So, the smaller the row in size, more rows would fit in the cache, but more key objects would be retained in the map. In your tests, assuming the default block cache size of 40% of the heap, it would give a 12.8GB of block cache. Have you managed to measure the block cache usage by the row cache, in terms of number of rows in the cache, byte size of the L1 cache and the total heap usage? Maybe wort collecting a heapdump to analyse the map index size in the heap.

I slightly modified the LruBlockCache code to record the row cache size and entry count. The row cache occupies 268.67MB with 338,602 entries. The average size of a single row cache entry is 830 bytes. Within the overall BlockCache, the row cache accounts for 45% by entry count and 2% by size.

2025-09-12T09:08:44,112 INFO  [LruBlockCacheStatsExecutor {}] hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB, max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857, hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860, cachingHitsRatio=76.24%, evictions=170, evicted=5806436, evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB

wchevreuil · 2025-09-12T13:00:02Z

That said, the row cache does not actually cache HFileBlocks, yet it currently relies on the BlockCache interface. I realize this might not be appropriate. I reused the BlockCache interface to reduce the overhead of creating a separate cache implementation solely for the row cache, but in hindsight, this might not have been the best approach. It may be better to build a dedicated cache implementation specifically for the row cache.

What do you think?

Yeah, I had the same thought while going through the comments. Having a separate cache structure seems the best way to implement this.

Are we accepting to have same row data in multiple cache? In the current code, I haven't see any checks to avoid that. Maybe if we implement RowCacheService as a block cache implementation, so that the cache operations happen from the inner layers of the read/write operations, it would be easier to avoid duplication.

What exactly does “multiple cache” refer to? Does it mean the L1 and L2 caches in the CombinedBlockCache? If so, I haven’t really considered that aspect yet, but I’ll start looking into it.

Nevermind my previous comment. We should focus on the separate cache for rows.

Why not simply evict the row that got mutated? I guess we cannot simply override it in the cache because mutation can happen on individual cells.

I didn’t fully understand the intention behind your question. Could you please explain it in more detail?

Rather than blocking writes to the row cache during updates/bulkload, can we simply make the updates evict/override the row from the cache if it's already there? For puts, we shouldn't need to worry about barries, if we make sure we don't cache the row if it's in the memstore only, but we should to make sure to remove it from the row cache because the cache would now be stale. For bulkloads, I guess we only need to make sure to evict the rows for affected regions after the bulkload has been committed.

Are we accepting to have data duplicated over separate caches? I don't see any logic to avoid caching a whole block containing a region for a Get in the L2 cache, still we'll be cache the row in the row cache. Similarly, we might re-cache a row that's in the memstore in the row cache.

This is in the same L1/L2 context as your comment 2, correct? If so, I haven’t considered that aspect yet, but I’ll start thinking about how to handle it.

Since the row cache is only enabled when there are at least two HFiles, rows that exist only in the MemStore are not cached. However, when there are two or more HFiles, rows in MemStore are also added again to the row cache. This is an intentional design choice, aimed at bypassing the process of generating results via SegmentScanner and StoreFileScanner, and instead serving Get requests directly from the cache.

Per other comments, agree it's fine to have the row in the row cache and its' block also in the block cache. We need to decide if we want to add blocks to the block cache when doing Get, or Get should cache only in the row cache? Also, should we avoid caching if the row is the memstore? Could be challenging in the current design of caching the whole row, because memstore migh have only updates for few cells within a row.

One problem of adding such small units (a single row) in the cache is that we need to keep a map index for each entry. So, the smaller the row in size, more rows would fit in the cache, but more key objects would be retained in the map. In your tests, assuming the default block cache size of 40% of the heap, it would give a 12.8GB of block cache. Have you managed to measure the block cache usage by the row cache, in terms of number of rows in the cache, byte size of the L1 cache and the total heap usage? Maybe wort collecting a heapdump to analyse the map index size in the heap.

I slightly modified the LruBlockCache code to record the row cache size and entry count. The row cache occupies 268.67MB with 338,602 entries. The average size of a single row cache entry is 830 bytes. Within the overall BlockCache, the row cache accounts for 45% by entry count and 2% by size.
2025-09-12T09:08:44,112 INFO  [LruBlockCacheStatsExecutor {}] hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB, max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857, hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860, cachingHitsRatio=76.24%, evictions=170, evicted=5806436, evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB

What if more rows get cached, over time, as more gets for different rows are executed? It could lead to many rows in the cache, and many more objects in the map to index it. In the recent past. we've seen some heap issues when having very large file based bucket cache and small compressed blocks. I guess we could face similar problems here too.

Apache9 · 2025-09-12T15:18:18Z

The design doc looks good. Skimmed the code, seems we put row cache into block cache? Minding explaining more on why we choose to use block cache to implement row cache? What is the benefit?

Thanks.

EungsopYoo · 2025-09-15T00:54:34Z

Why not simply evict the row that got mutated? I guess we cannot simply override it in the cache because mutation can happen on individual cells.

I didn’t fully understand the intention behind your question. Could you please explain it in more detail?

Rather than blocking writes to the row cache during updates/bulkload, can we simply make the updates evict/override the row from the cache if it's already there? For puts, we shouldn't need to worry about barries, if we make sure we don't cache the row if it's in the memstore only, but we should to make sure to remove it from the row cache because the cache would now be stale. For bulkloads, I guess we only need to make sure to evict the rows for affected regions after the bulkload has been committed.

When the data exists in both the MemStore and the StoreFiles, we need to store it in the row cache to avoid result merging. In that case, due to the following issues, a barrier was introduced.

Thread	Time 1	Time 2	Time 3	Time 4
th1	delete row1 from RowCache	Put row1 to Region	write row1 to RowCache
th2		delete row1 from RowCache	Put row1 to Region	write row1 to RowCache
th3	Get for row1 not from RowCache. Good	Get for row1 not from RowCache. Good	Get for row1 from stale RowCache. Bad	Get for row1 not from RowCache. Good

It would be more efficient to do as you mentioned when doing a bulkload.

Are we accepting to have data duplicated over separate caches? I don't see any logic to avoid caching a whole block containing a region for a Get in the L2 cache, still we'll be cache the row in the row cache. Similarly, we might re-cache a row that's in the memstore in the row cache.

This is in the same L1/L2 context as your comment 2, correct? If so, I haven’t considered that aspect yet, but I’ll start thinking about how to handle it.
Since the row cache is only enabled when there are at least two HFiles, rows that exist only in the MemStore are not cached. However, when there are two or more HFiles, rows in MemStore are also added again to the row cache. This is an intentional design choice, aimed at bypassing the process of generating results via SegmentScanner and StoreFileScanner, and instead serving Get requests directly from the cache.

Per other comments, agree it's fine to have the row in the row cache and its' block also in the block cache. We need to decide if we want to add blocks to the block cache when doing Get, or Get should cache only in the row cache? Also, should we avoid caching if the row is the memstore? Could be challenging in the current design of caching the whole row, because memstore migh have only updates for few cells within a row.

I already answered this in another comment, but I’ll respond here as well.

I think it’s better to put it into the BlockCache when doing a Get, according to the BlockCache setting.

It is more efficient not to create a row cache when the cells to be fetched exist only in the MemStore. However, if the cells to be fetched are in both the MemStore and the StoreFiles, then creating a row cache is efficient to avoid result merging.

I’ll give some more thought on how we can achieve this.

One problem of adding such small units (a single row) in the cache is that we need to keep a map index for each entry. So, the smaller the row in size, more rows would fit in the cache, but more key objects would be retained in the map. In your tests, assuming the default block cache size of 40% of the heap, it would give a 12.8GB of block cache. Have you managed to measure the block cache usage by the row cache, in terms of number of rows in the cache, byte size of the L1 cache and the total heap usage? Maybe wort collecting a heapdump to analyse the map index size in the heap.

I slightly modified the LruBlockCache code to record the row cache size and entry count. The row cache occupies 268.67MB with 338,602 entries. The average size of a single row cache entry is 830 bytes. Within the overall BlockCache, the row cache accounts for 45% by entry count and 2% by size.
2025-09-12T09:08:44,112 INFO  [LruBlockCacheStatsExecutor {}] hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB, max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857, hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860, cachingHitsRatio=76.24%, evictions=170, evicted=5806436, evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB
What if more rows get cached, over time, as more gets for different rows are executed? It could lead to many rows in the cache, and many more objects in the map to index it. In the recent past. we've seen some heap issues when having very large file based bucket cache and small compressed blocks. I guess we could face similar problems here too.

Okay. Then I’ll take a heap dump and check the size of the map’s index.

EungsopYoo · 2025-09-15T00:58:35Z

The design doc looks good. Skimmed the code, seems we put row cache into block cache? Minding explaining more on why we choose to use block cache to implement row cache? What is the benefit?

Thanks.

I did it that way because the implementation was simpler. However, it causes confusion and makes it harder to have clear control over the row cache, so I’ve decided to create a separate RowCache implementation.

EungsopYoo · 2025-09-15T01:03:29Z

The TODOs are as follows, and I will proceed in order:

Separate the row cache implementation
Remove the condition that decides whether to put data into the row cache based on the number of StoreFiles
Do not use the row cache when the data exists only in the MemStore
Invalidate only the row cache of regions that were bulkloaded
Take a heap dump to check the index size of the map

- Implement RowCache - Initially considered modifying LruBlockCache, but the required changes were extensive. Instead, implemented RowCache using Caffeine cache. - Add row.cache.size configuration - Default is 0.0 (disabled); RowCache is enabled only if explicitly set to a value > 0. - The combined size of BlockCache + MemStore + RowCache must not exceed 80% of the heap. - Add Row Cache tab to RegionServer Block Cache UI - RowCache is not a BlockCache, but added here since there is no better place. - Add RowCache metrics - Metrics for size, count, eviction, hit, and miss are now exposed.

EungsopYoo · 2025-09-25T04:32:21Z

One problem of adding such small units (a single row) in the cache is that we need to keep a map index for each entry. So, the smaller the row in size, more rows would fit in the cache, but more key objects would be retained in the map. In your tests, assuming the default block cache size of 40% of the heap, it would give a 12.8GB of block cache. Have you managed to measure the block cache usage by the row cache, in terms of number of rows in the cache, byte size of the L1 cache and the total heap usage? Maybe wort collecting a heapdump to analyse the map index size in the heap.

I slightly modified the LruBlockCache code to record the row cache size and entry count. The row cache occupies 268.67MB with 338,602 entries. The average size of a single row cache entry is 830 bytes. Within the overall BlockCache, the row cache accounts for 45% by entry count and 2% by size.
2025-09-12T09:08:44,112 INFO  [LruBlockCacheStatsExecutor {}] hfile.LruBlockCache: totalSize=12.80 GB, usedSize=12.48 GB, freeSize=329.41 MB, max=12.80 GB, blockCount=752084, accesses=35942999, hits=27403857, hitRatio=76.24%, , cachingAccesses=35942954, cachingHits=27403860, cachingHitsRatio=76.24%, evictions=170, evicted=5806436, evictedPerRun=34155.50588235294, rowBlockCount=338602, rowBlockSize=268.67 MB
What if more rows get cached, over time, as more gets for different rows are executed? It could lead to many rows in the cache, and many more objects in the map to index it. In the recent past. we've seen some heap issues when having very large file based bucket cache and small compressed blocks. I guess we could face similar problems here too.
Okay. Then I’ll take a heap dump and check the size of the map’s index.

I configured the RegionServer with a 4 GB heap, setting hfile.block.cache.size to 0.3 and row.cache.size to 0.1, then reran the same workload as before. Under these settings, the maximum RowCache capacity is approximately 400 MB. After the RowCache was fully populated, I generated and analyzed a heap dump.

RowCache Size: 409 MB
RowCache Count: 697,234 entries
Average RowCache Entry Size: 615 B
- This is reduced from 830 B previously, mainly due to a simplified RowCacheKey.
Retained Heap Size: 622 MB
- Because of the overhead associated with Caffeine’s key/value structures, the retained size on heap amounts to 52% more than the actual data size for this workload.
- I believe this is acceptable if the RowCache size is configured relatively smaller than the BlockCache, for example, around 2% of the BlockCache size. The positive impact of RowCache is already noticeable even at this smaller capacity.

EungsopYoo · 2025-09-25T04:34:33Z

@wchevreuil
I have completed all the tasks on the TODO list. Please review it again.

And remove the condition that decides whether to put data into the row cache based on the number of StoreFiles

EungsopYoo · 2025-09-25T05:01:27Z

I’m currently trying to determine the appropriate size of the RowCache relative to the BlockCache.

EungsopYoo · 2025-09-25T05:06:12Z

hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/BlockCacheTmpl.jamon

        </div>
+        <div class="tab-pane" id="tab_row_cache" role="tabpanel">
+            <& row_cache_stats; rowCache = rowCache &>
+        </div>


Nit: We may need to rename the labels here. Where we current say "Block Cache" should be only "Cache", then on L1/L2 tabs should be labeled "BlockCache L1"/"BlockCache L2".

EungsopYoo · 2025-09-25T05:07:12Z

...compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java

+        .addCounter(Interns.info(ROW_CACHE_EVICTED_ROW_COUNT, ""),
+          rsWrap.getRowCacheEvictedRowCount())
+        .addGauge(Interns.info(ROW_CACHE_SIZE, ""), rsWrap.getRowCacheSize())
+        .addGauge(Interns.info(ROW_CACHE_COUNT, ""), rsWrap.getRowCacheCount())


wchevreuil · 2025-09-26T15:31:19Z

@wchevreuil I have completed all the tasks on the TODO list. Please review it again.

Thanks! Please allow me a few days to review it.

wchevreuil

Sorry for lagging on this, @EungsopYoo , I'm still going through the core of your implementation, but here goes some minor "cosmetic" changes I think we could do here.

I may give another review by tomorrow EOD.

wchevreuil · 2025-10-01T10:50:52Z

...compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java

+        .addCounter(Interns.info(ROW_CACHE_EVICTED_ROW_COUNT, ""),
+          rsWrap.getRowCacheEvictedRowCount())
+        .addGauge(Interns.info(ROW_CACHE_SIZE, ""), rsWrap.getRowCacheSize())
+        .addGauge(Interns.info(ROW_CACHE_COUNT, ""), rsWrap.getRowCacheCount())


wchevreuil · 2025-10-01T10:53:28Z

hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/BlockCacheTmpl.jamon

        </div>
+        <div class="tab-pane" id="tab_row_cache" role="tabpanel">
+            <& row_cache_stats; rowCache = rowCache &>
+        </div>


Nit: We may need to rename the labels here. Where we current say "Block Cache" should be only "Cache", then on L1/L2 tabs should be labeled "BlockCache L1"/"BlockCache L2".

wchevreuil · 2025-10-01T10:54:57Z

hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/BlockCacheTmpl.jamon

+    RowCache rowCache;
+</%args>
+<%if rowCache == null %>
+<p>RowCache is null</p>


Should we rather say: "RowCache disabled"?

https://github.com/ben-manes/caffeine?tab=readme-ov-file#download It is recommended to use 3.x for Java 11 or above.

Apache-HBase · 2025-10-05T03:33:29Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 34s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+0 🆗	mvndep	0m 42s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 36s		master passed
+1 💚	compile	8m 16s		master passed
+1 💚	checkstyle	1m 14s		master passed
+1 💚	spotbugs	10m 31s		master passed
+0 🆗	refguide	2m 27s		branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.
+1 💚	spotless	0m 47s		branch has no errors when running spotless:check.
-0 ⚠️	patch	1m 16s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 16s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 5s		the patch passed
+1 💚	compile	8m 18s		the patch passed
+1 💚	javac	8m 18s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 32s	/buildtool-patch-checkstyle-root.txt	The patch fails to run checkstyle in root
+1 💚	xmllint	0m 0s		No new issues.
+1 💚	spotbugs	11m 0s		the patch passed
+0 🆗	refguide	2m 3s		patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.
+1 💚	hadoopcheck	12m 7s		Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚	spotless	0m 45s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 44s		The patch does not generate ASF License warnings.
		75m 36s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7291/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#7291
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless xmllint refguide
uname	Linux 20f18c0a92f2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `99ce0f0`
Default Java	Eclipse Adoptium-17.0.11+9
refguide	https://nightlies.apache.org/hbase/HBase-PreCommit-GitHub-PR/PR-7291/10/yetus-general-check/output/branch-site/book.html
refguide	https://nightlies.apache.org/hbase/HBase-PreCommit-GitHub-PR/PR-7291/10/yetus-general-check/output/patch-site/book.html
Max. process+thread count	191 (vs. ulimit of 30000)
modules	C: hbase-common hbase-hadoop-compat hbase-client hbase-server . U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7291/10/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3 xmllint=20913
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-10-05T07:47:15Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 32s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 32s		master passed
+1 💚	compile	2m 11s		master passed
+1 💚	javadoc	3m 7s		master passed
+1 💚	shadedjars	6m 15s		branch has no errors when building our shaded downstream artifacts.
-0 ⚠️	patch	6m 46s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 16s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 9s		the patch passed
+1 💚	compile	2m 16s		the patch passed
+1 💚	javac	2m 16s		the patch passed
+1 💚	javadoc	3m 8s		the patch passed
+1 💚	shadedjars	6m 16s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	291m 47s		root in the patch passed.
		330m 55s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7291/10/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#7291
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux d59483981ca2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `99ce0f0`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7291/10/testReport/
Max. process+thread count	7645 (vs. ulimit of 30000)
modules	C: hbase-common hbase-hadoop-compat hbase-client hbase-server . U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7291/10/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

wchevreuil · 2025-10-06T16:42:02Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCache.java

+  private final LongAdder hitCount = new LongAdder();
+  private final LongAdder missCount = new LongAdder();
+  private final LongAdder evictedRowCount = new LongAdder();
+


Can't we use Cache.stats() for this?

wchevreuil · 2025-10-06T19:08:41Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCache.java

+  void cacheBlock(RowCacheKey key, RowCells value) {
+    cache.put(key, value);
+  }
+
+  public RowCells getBlock(RowCacheKey key, boolean caching) {
+    if (!caching) {
+      missCount.increment();
+      return null;
+    }
+
+    RowCells value = cache.getIfPresent(key);
+    if (value == null) {
+      missCount.increment();
+    } else {
+      hitCount.increment();
+    }
+    return value;
+  }
+
+  void evictBlock(RowCacheKey key) {
+    cache.asMap().remove(key);
+  }


We should rename all these methods, as we are not caching blocks, but rows.

wchevreuil · 2025-10-06T19:30:35Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

+
+      // After creating the barrier, evict the existing row cache for this row,
+      // as it becomes invalid after the mutation
+      evictRowCache(key);
+
+      return execute(operation);
+    } finally {
+      // Remove the barrier after mutation to allow the row cache to be populated again
+      removeRowLevelBarrier(key);


Should (or could) we recache the mutated row?

wchevreuil · 2025-10-06T19:32:16Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

+    return operation.execute();
+  }
+
+  void evictRowCache(RowCacheKey key) {


nit: should be "evictRow".

wchevreuil · 2025-10-06T19:41:32Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheKey.java

+  // Row cache keys should not be evicted on close, since the cache may contain many entries and
+  // eviction would be slow. Instead, the region’s rowCacheSeqNum is used to generate new keys that
+  // ignore the existing cache when the region is reopened or bulk-loaded.


So when do stale rows in the row cache get evicted? And if we don't evict rows from cache for a closed region, would these be wasting cache space until cache is full and LFU logic finally finds those for eviction?

wchevreuil · 2025-10-06T19:55:49Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

+  /**
+   * This is used to invalidate the entire row cache after bulk loading.
+   */


Is this comment correct? I thought we would be invalidating only the rows for the given regions. Rows from regions not touched by bulkload would stay valid.

wchevreuil · 2025-10-06T20:03:11Z

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowCacheService.java

+  private final Map<RowCacheKey, AtomicInteger> rowLevelBarrierMap = new ConcurrentHashMap<>();
+
+  private final boolean enabledByConf;
+  private final RowCache rowCache;


Should we consider define an interface for RowCache and refer to the interface from here, so that we can accommodate future new RowCache implementations, beyond the caffeine one currently provided as the reference one?

Apache9 · 2025-10-09T07:03:40Z

The change is pretty large, I suggest we start a feature branch to land it step by step.

First we introduce the framework for integrating row cache, add a flag to enable/disable it, but the code when enabling row cache can be empty.
Then we implement all the necessary fencing code, introduce a row cache interface and a very simple row cache implementation, to verify the correctness.
And last, we introduce a more powerful implementation for row cache, which has a good performance.

And then we can start some integration tests, like YCSB to verify performance, and ITBLL to verify correctness, if all things are good, we can merge the feature branch back.

WDYT?

Thanks.

EungsopYoo · 2025-10-09T23:58:26Z

@Apache9
OK. I’ll create a feature branch and develop the work on sub-branches, merging them step by step.
It seems someone who has permission to create branches should make the feature branch, right?

@wchevreuil
I’ll address the review comments in a new branch.

Add row-level cache for the get operation

872f13a

EungsopYoo changed the title ~~Add row-level cache for the get operation~~ HBASE-29585 Add row-level cache for the get operation Sep 10, 2025