HBASE-29645 Reduce synchronization in AsyncBufferedMutatorImpl #7363

apurtell · 2025-10-07T01:04:28Z

This patch modifies AsyncBufferedMutatorImpl class to improve its performance under concurrent usage.

While AsyncTable#batch() is largely asynchronous in nature, it can exhibit blocking behavior during its preparation phase, for instance, while looking up region locations. In the original implementation of AsyncBufferedMutatorImpl, calls to AsyncTable#batch() occur within a synchronized block, potentially causing severe contention and stalling other threads trying to buffer their mutations. The original implementation relied on coarse grained synchronized blocks for multi-threading safety, so when one thread triggered a buffer flush (either because the buffer was full or a periodic timer fired), all other threads attempting to add mutations via the mutate method would be blocked until the table.batch() call completed, which could take a surprisingly long time.

The new implementation replaces the broad synchronized blocks with a ReentrantLock. This lock is acquired only for the brief period needed to safely copy the current batch of mutations and futures into local variables and swap in a new internal buffer. Immediately after this quick operation, the lock is released. The batch() call is then executed outside of the locked section. This allows other threads to continue adding new mutations concurrently while the flushing of the previous batch proceeds independently. The client has already opted in to asynchronous and potentially interleaved commit of the mutations submitted to AsyncBufferedMutator, by definition. The minimization of critical section scope minimizes thread contention and significantly boosts throughput under load. Other related profiler driven efficiency changes are also included, such as elimination of stream api and array resizing hotspots identified by the profiler.

To validate the performance improvement of these changes, a JMH benchmark, AsyncBufferedMutatorBenchmark was created to measure the performance of the mutate method under various conditions. It focuses specifically on the overhead and concurrency management of AsyncBufferedMutatorImpl itself, not the underlying network communication. To achieve this, it uses the Mockito framework to create a mock AsyncTable that instantly returns completed futures, isolating the mutator's buffering logic for measurement. It runs tests with 1, 10, and 100 threads to simulate no, medium, and high levels of concurrency. It uses a low value (100) for maxMutations to force frequent flushes based on the number of mutations, and a very high value (100,000) to ensure flushes are rare in that measurement case. The benchmark measures the average time per operation in microseconds, where a lower score indicates better performance and higher throughput.

With a single thread and no contention the performance of both implementations is nearly identical. The minor variations are negligible and show that the new locking mechanism does not introduce any performance regression in the non-concurrent case. For example, with a 10MB buffer and high maxMutations, the NEW implementation scored 0.167 us/op while the OLD scored 0.169 us/op, a statistically insignificant difference. When the test is run with 10 threads, a noticeable gap appears. In the scenario designed to cause frequent flushes (maxMutations = 100), the NEW implementation is approximately 12 times faster than the OLD one (14.250 us/op for NEW vs. 172.463 us/op for OLD). This is because the OLD implementation forces threads to wait while flushes occur, and flushes incur a synthetic thread sleep of 1ms to simulate occasional unexpected blocking behavior in AsyncTable#batch(), whereas the NEW implementation allows them to proceed without contention. The most significant results come from the 100-thread tests, which simulate high contention. In the frequent flush scenario (maxMutations = 100) the NEW implementation is 114 times faster in the synthetic benchmark scenario (16.123 us/op for NEW vs. 1847.567 us/op for OLD). Note that blocking IO observed in a real client for e.g. region location lookups can produce a much more significant impact. With the OLD code, 100 threads are constantly competing for a lock that is held for a long duration, leading to a contention storm. The NEW code's reduced locking scope almost entirely eliminates this bottleneck.

OS: Apple Silicon (aarch64) M1 Max / 64 GB

JVM: openjdk version "17.0.11" 2024-04-16 LTS / OpenJDK 64-Bit Server VM Zulu17.50+19-CA (build 17.0.11+9-LTS, mixed mode, sharing)

Threads	Max Mutations	OLD Implementation (µs/op)	NEW Implementation (µs/op)	Performance Gain (OLD / NEW)
1	100	14.091	16.313	0.86x (comparable)
1	100,000	0.169	0.167	1.01x (comparable)
10	100	172.463	14.250	12.10x
10	100,000	2.465	1.072	2.30x
100	100	1847.567	16.123	114.59x
100	100,000	24.125	12.796	1.89x

Apache9

So the intention here is to move table.batch call outside the synchronization block. As you said, table.batch should already been asynchronous, so it is a bit suprise that move it outside the synchronization can greatly increase the performance.

In general, all rpc request like locating a region should be an asynchronous call, do you have more details on what makes the table.batch call blocking for a long time?

The implementation looks good, but we still need to consideration more about the correctness. IIRC when running ITBLL against branch-3, we found a data loss issue in AsyncBufferedMuratorImpl, even though the synchronization is very simple...

Thanks.

Apache9 · 2025-10-07T09:09:54Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncBufferedMutatorImpl.java

      if (this.mutations.isEmpty() && periodicFlushTimeoutNs > 0) {
        periodicFlushTask = periodicalFlushTimer.newTimeout(timeout -> {
+          boolean shouldFlush = false;
          synchronized (AsyncBufferedMutatorImpl.this) {


This should also be changed to lock.lock()?

Apache9 · 2025-10-07T09:10:49Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncBufferedMutatorImpl.java

        }, periodicFlushTimeoutNs, TimeUnit.NANOSECONDS);
      }
+      // Preallocate to avoid potentially multiple resizes during addAll if we can.
+      if (this.mutations instanceof ArrayList && this.futures instanceof ArrayList) {


These two fields are private fields, so just declare them as ArrayList?

apurtell · 2025-10-07T15:43:33Z

In general, all rpc request like locating a region should be an asynchronous call, do you have more details on what makes the table.batch call blocking for a long time?

One pattern we see is the caller threads to mutator.mutate() are blocked by long call chains through netty doing IO for region location lookup.

The theory as to why is CompletableFuture has two modes of operation... there is synchronous chaining (thenApply, thenAccept, whenComplete) where the code executes in the same thread that completes the future, while only asynchronous chaining (thenApplyAsync, thenAcceptAsync, whenCompleteAsync) guarantees execution of the lambdas in a separate thread context. HBase async client code may chain by thenX() not thenXAsync(). If some upstream code like region (re)location completes a future synchronously the downstream pipeline executes immediately in the current thread context and blocks the caller.

The change to ABM guarantees it won't impact callers to mutate() if this happens, although the fact this happens suggests elsewhere in the async client changes from thenX() to thenXAsync() are warranted, as a follow up issue.

apurtell · 2025-10-07T16:37:25Z

f4d8007 addresses spotbugs findings and implements review feedback.

apurtell · 2025-10-07T16:47:02Z

The implementation looks good, but we still need to consideration more about the correctness.

@Apache9 What would you suggest here?

I can add more test coverage. Something like this seems straightforward to implement and will not increase the running time of the test suite too much:

Step 1: Generate a test data set of configurable size, default 100K rows.
Step 2: Use ABM to commit the test data set.
Step 3: Scan the test table and validate every row of the test data set has been committed.

This will provide test coverage we don't have now, but it does not validate correctness beyond the happy path.

Apache-HBase · 2025-10-07T17:31:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 31s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 58s		master passed
+1 💚	compile	4m 16s		master passed
+1 💚	checkstyle	0m 54s		master passed
+1 💚	spotbugs	2m 20s		master passed
+1 💚	spotless	0m 53s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 6s		the patch passed
+1 💚	compile	4m 9s		the patch passed
+1 💚	javac	4m 9s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 52s		the patch passed
+1 💚	spotbugs	2m 28s		the patch passed
+1 💚	hadoopcheck	12m 18s		Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚	spotless	0m 45s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 18s		The patch does not generate ASF License warnings.
		45m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7363/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#7363
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 2a88c42772a2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `f4d8007`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	85 (vs. ulimit of 30000)
modules	C: hbase-client hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7363/2/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-10-07T20:46:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 28s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for branch
+1 💚	mvninstall	3m 50s		master passed
+1 💚	compile	1m 17s		master passed
+1 💚	javadoc	0m 46s		master passed
+1 💚	shadedjars	6m 9s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 14s		Maven dependency ordering for patch
+1 💚	mvninstall	3m 0s		the patch passed
+1 💚	compile	1m 18s		the patch passed
+1 💚	javac	1m 18s		the patch passed
+1 💚	javadoc	0m 44s		the patch passed
+1 💚	shadedjars	6m 2s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	1m 30s		hbase-client in the patch passed.
+1 💚	unit	209m 40s		hbase-server in the patch passed.
		240m 29s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7363/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#7363
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux dbbcb10acb8f 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `f4d8007`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7363/2/testReport/
Max. process+thread count	3558 (vs. ulimit of 30000)
modules	C: hbase-client hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7363/2/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

d-c-manning

In general, all rpc request like locating a region should be an asynchronous call, do you have more details on what makes the table.batch call blocking for a long time?

I assume we're not waiting for any RPC responses on the internalFlush thread, but with large enough buffers and large enough numbers of mutations and high enough concurrency on incoming mutations, it seems even AsyncBatchRpcRetryingCaller#groupAndSend can take long enough (milliseconds?) to decrease overall throughput.

If the region location is in the cache, then the future completes synchronously in AsyncNonMetaRegionLocator#getRegionLocationsInternal, which allows more work under the synchronized block, which prevents further mutations from being accepted.

hbase/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

Lines 504 to 513 in d0b9478

    
           private CompletableFuture<RegionLocations> getRegionLocationsInternal(TableName tableName, 
        
             byte[] row, int replicaId, RegionLocateType locateType, boolean reload) { 
        
             // AFTER should be convert to CURRENT before calling this method 
        
             assert !locateType.equals(RegionLocateType.AFTER); 
        
             TableCache tableCache = getTableCache(tableName); 
        
             if (!reload) { 
        
               RegionLocations locs = locateInCache(tableCache, row, replicaId, locateType); 
        
               if (isGood(locs, replicaId)) { 
        
                 return CompletableFuture.completedFuture(locs); 
        
               }

When the number of mutations could be >100,000 and number of region locations could be >10,000, and most of those locations are in the cache, groupAndSend of those Multi RPCs yields a non-trivial amount of work.

d-c-manning · 2025-10-07T16:42:24Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncBufferedMutatorImpl.java

+      // Preallocate to avoid potentially multiple resizes during addAll
+      this.mutations.ensureCapacity(this.mutations.size() + mutations.size());
+      this.futures.ensureCapacity(this.futures.size() + futures.size());


If they are ArrayList, do we need to call ensureCapacity or do the internals of addAll in the jdk already do that? https://github.com/openjdk/jdk8u/blob/5cffbcb0344f2cf16682a09519894ba705182241/jdk/src/share/classes/java/util/ArrayList.java#L582-L585

Apache9 · 2025-10-08T13:05:23Z

In general, all rpc request like locating a region should be an asynchronous call, do you have more details on what makes the table.batch call blocking for a long time?

I assume we're not waiting for any RPC responses on the internalFlush thread, but with large enough buffers and large enough numbers of mutations and high enough concurrency on incoming mutations, it seems even AsyncBatchRpcRetryingCaller#groupAndSend can take long enough (milliseconds?) to decrease overall throughput.

If the region location is in the cache, then the future completes synchronously in AsyncNonMetaRegionLocator#getRegionLocationsInternal, which allows more work under the synchronized block, which prevents further mutations from being accepted.

hbase/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

Lines 504 to 513 in d0b9478

private CompletableFuture<RegionLocations> getRegionLocationsInternal(TableName tableName,

byte[] row, int replicaId, RegionLocateType locateType, boolean reload) {

// AFTER should be convert to CURRENT before calling this method

assert !locateType.equals(RegionLocateType.AFTER);

TableCache tableCache = getTableCache(tableName);

if (!reload) {

RegionLocations locs = locateInCache(tableCache, row, replicaId, locateType);

if (isGood(locs, replicaId)) {

return CompletableFuture.completedFuture(locs);

}

When the number of mutations could be >100,000 and number of region locations could be >10,000, and most of those locations are in the cache, groupAndSend of those Multi RPCs yields a non-trivial amount of work.

So maybe we should check depth of the stack trace? In netty there are some tricks around this area, if the future is complete synchronously all the time and makes a very deep call stack trace, it will force schedule an asynchronous task to prevent stack overflow and also reduce the blocking execution time.

Apache9 · 2025-10-08T13:07:38Z

The implementation looks good, but we still need to consideration more about the correctness.

@Apache9 What would you suggest here?

I can add more test coverage. Something like this seems straightforward to implement and will not increase the running time of the test suite too much:

Step 1: Generate a test data set of configurable size, default 100K rows. Step 2: Use ABM to commit the test data set. Step 3: Scan the test table and validate every row of the test data set has been committed.

This will provide test coverage we don't have now, but it does not validate correctness beyond the happy path.

We need to manually consider the implementation carefully, and after merging this PR, we should run ITBLL several rounds before cutting the next release.

Apache9 · 2025-10-08T13:32:20Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncBufferedMutatorImpl.java


  private static final Logger LOG = LoggerFactory.getLogger(AsyncBufferedMutatorImpl.class);

+  private final int INITIAL_CAPACITY = 100;


Apache9 · 2025-10-08T13:41:47Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncBufferedMutatorImpl.java

+    internalFlush(FlushType.MANUAL);
+  }
+
+  protected void internalFlush(FlushType trigger) {


Maybe a better choice is to inline this method to the caller, and make the lock protect the check and replace logic, i.e, if we think we should flush, then swap the mutations and futures to local variables and recreate them, and set a local bool variable may be called shouldSend to true, and check this bool outside the lock protection and send the mutate request out.

In this way we do not need the double check and flush type check too, and the logic will be more easy to understand. The only problem is the test which overrides internalFlush method. We could try to find other ways to implement it.

HBASE-29645 Reduce synchronization in AsyncBufferedMutatorImpl

d7aa325

This comment has been minimized.

Sign in to view

Apache9 reviewed Oct 7, 2025

View reviewed changes

Apply review feedback and spotless

f4d8007

d-c-manning approved these changes Oct 7, 2025

View reviewed changes

Apache9 reviewed Oct 8, 2025

View reviewed changes

	private CompletableFuture<RegionLocations> getRegionLocationsInternal(TableName tableName,
	byte[] row, int replicaId, RegionLocateType locateType, boolean reload) {
	// AFTER should be convert to CURRENT before calling this method
	assert !locateType.equals(RegionLocateType.AFTER);
	TableCache tableCache = getTableCache(tableName);
	if (!reload) {
	RegionLocations locs = locateInCache(tableCache, row, replicaId, locateType);
	if (isGood(locs, replicaId)) {
	return CompletableFuture.completedFuture(locs);
	}


		private static final Logger LOG = LoggerFactory.getLogger(AsyncBufferedMutatorImpl.class);

		private final int INITIAL_CAPACITY = 100;

HBASE-29645 Reduce synchronization in AsyncBufferedMutatorImpl #7363

Are you sure you want to change the base?

HBASE-29645 Reduce synchronization in AsyncBufferedMutatorImpl #7363

Conversation

apurtell commented Oct 7, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

Apache9 left a comment

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apurtell commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apurtell commented Oct 7, 2025

Uh oh!

apurtell commented Oct 7, 2025

Uh oh!

Apache-HBase commented Oct 7, 2025

Uh oh!

Apache-HBase commented Oct 7, 2025

Uh oh!

d-c-manning left a comment

Choose a reason for hiding this comment

Uh oh!

d-c-manning Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Apache9 commented Oct 8, 2025

Uh oh!

Apache9 commented Oct 8, 2025

Uh oh!

Apache9 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Apache9 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

apurtell commented Oct 7, 2025 •

edited

Loading