hcd-130 diag patch #1743

bereng · 2025-05-20T13:25:19Z

What is the issue

...

What does this PR fix and why was it fixed

...

bereng · 2025-05-20T13:25:36Z

test/unit/org/apache/cassandra/db/compaction/UnifiedCompactionStrategyTest.java

@@ -1581,6 +1582,69 @@ public void testPending()
            assertEquals(8, aggregate.getPending().size());
    }

+    @Test
+    public void applyMaxParallelism()


sonarqubecloud · 2025-05-20T17:23:54Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
80.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-05-20T18:44:15Z

❌ Build ds-cassandra-pr-gate/PR-1743 rejected by Butler

10 new test failure(s) in 1 builds
See build details here

Found 10 new test failures

Test	Explanation	Branch history
t.TestCqlshUnicode.test_unicode_desc	regression	🔴
t.TestCqlshUnicode.test_unicode_identifier	regression	🔴
...gLegacyIndex.test_sstableloader_with_failing_2i	regression	🔴
...tCorruptedSSTablesWithLeveledCompactionStrategy	regression	🔴
o.a.c.d.t.s.TraceTest.testMultiIndexTracing	regression	🔴
...t.testKDTreePostingsQueryMetricsWithSingleIndex	regression	🔴
...Test.testFinalOpenRetainsCachedData[format=BIG]	regression	🔴
...Test.testFinalOpenRetainsCachedData[format=BTI]	regression	🔴
...m.TrieMemtableMetricsTest.testContentionMetrics	regression	🔴
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS...	regression	🔴

No known test failures found

bereng · 2025-05-21T06:36:12Z

The only sstable Ci failures also repro on the base branch

bereng · 2025-05-29T11:49:36Z

This is not ready for review but otherwise CI won't trigger

bereng · 2025-06-04T07:04:52Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

@@ -2792,7 +2795,8 @@ public <V> V runWithCompactionsDisabled(Callable<V> callable,
    {
        // synchronize so that concurrent invocations don't re-enable compactions partway through unexpectedly,
        // and so we only run one major compaction at a time
-        synchronized (this)
+        rwcdLock.lock();


@blambov let me pick your brain here. We're getting deadlocks here when syncing on CF and later on the strategy when pausing or resuming compactions. Then a UCS task locks on the strategy first an later on CF. Trying to reorder locking produces all sorts of new deadlocks elsewhere in the code.

Given runWithCompactionsDisabled syncs on CF to avoid concurrent calls iiuc, do you see any problems with this approach?

Use an aux lock to avoid concurrent calls

Reorder the sync against CF to come later avoiding nested and badly ordered syncs

I am just wary the original sync against CF was guarding against sthg else I might be missing.

This approach does sound sensible to me. This is a very long process to be locking the CFS.

I took a look at the usages of synchronization in CFS and they fall in four categories:

locking data for memtable changes (flush, drop, etc.),

locking this for recalculating state (local data ranges),

locking this to apply long-running tasks that forbid compaction,

locking the static class object to create new CFS instances.

The second and third categories are incompatible and cause this deadlock. One of categories needs to use a different lock. Maybe call it longRunningSerializedOperationsLock to also include any further uses of the kind.

Or maybe use a separate single-threaded executor? Now that this comes to mind, what thread executes these operations, and can we cause a deadlock by holding that thread while waiting for something to complete on it? This could also happen if the running thread isn't single-threaded, if we manage to issue enough requests to park all of them... Sounds like it is safest to use a dedicated executor to solve the serialization problem for the compactions-disabled tasks and anything similar we may need in the future.

Not sure I followed that single thread executor approach you mention? I pushed a version where the actual cfs sync and callable all happen in their own single thread. That should isolate that thread from heavy parking the previous executor pool whichever that was. Is that what you meant? I feel like I'm missing sthg here bc that has nothing to do with the serialization.

I meant the whole thing. I now realize this is a synchronous method that returns a value so we can't really gain anything by putting it in a thread as we still have to block to get the value.

blambov · 2025-06-04T11:57:11Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

-                        // doublecheck that we finished, instead of timing out
-                        for (ColumnFamilyStore cfs : toInterruptFor)
+                    return Executors.newSingleThreadExecutor().submit(() -> {
+                        synchronized (this)


Why do we need this? And if we do need to synchronize here, should it not be on data?

Good question. Given many nodetool commands call into here shouldn't we block on the whole CF instead of only data? We don't know the nature of all commands, customer extensions or future commands. Syncing on whole CF seems best. Wdyt?

I don't think synchronized (this) stops compactions initiating work (which is what this seems to want to achieve) as it does not prevent changes to the sstable set (synchronized (data) does the latter). And it does leave us open to the same kind of deadlock. It may be even worse if we lock data, unless we are certain none of the callables can perform work in another thread.

I personally think it would be safest not to do any locking here. If a callable needs it, it should lock data to stop concurrent sstable set changes. If necessary, we can provide a method that also wraps the callable in a synchronized (data) and use it for calls we know are just selecting sstables (e.g. not for truncateBlocking which does a lot of work inside that call).

Sorry, even synchronized(data) does not prevent changes to the sstable set. Let's get rid of this lock completely.

Apologies I was pulled into a meeting.

Removing synchronized (this) can make calls to nodetool commands and invalidateLocalRangesAndDiskBoundaries(), importNewSSTables(), getLocalRanges() and invalidateLocalRangesAndDiskBoundaries() race i.e.

But iiuc you're saying we don't need to lock at this level bc the callable will just be consuming the current CF API which is already thread-safe. If any callable was to break that it is it's responsibility to sync when needed. The serialization synchronized (this) was trying to provide it is already provided now by longRunningSerializedOperationsLock.

Is this what you're saying? I am still wary nodetool commands needing to sync on this. But we can forward your proposal to the customer, if all goes good port it to OSS where more eyes will see it and then merge it.

Yes, we started with synchronization on the whole method, done (according to the comment) to prevent other operations of this kind racing, and we replaced that with a different lock. The extra synchronization on this is thus not necessary as other e.g. nodetool commands cannot run in parallel because of the new lock.

The thing synchronized (this) actually blocks is recalculation of token ranges by other threads, which is not something we need to block, and which could still lead to a deadlock.

CI looks good Let's forward this latest diag to the customer and hope it's the last bug we hit!

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

hcd-130 diag patch

e0aeae3

bereng commented May 20, 2025

View reviewed changes

bereng closed this May 21, 2025

bereng reopened this May 21, 2025

bereng marked this pull request as draft May 21, 2025 06:36

bereng added 2 commits May 26, 2025 14:12

Fine tune logging

8fc4066

POC sync on cfs

62cf650

bereng marked this pull request as ready for review May 29, 2025 11:49

bereng added 3 commits May 29, 2025 15:54

junit path

4a9f924

Non init junit path

c4f4dc5

POC guard RWCD with a lock and do lock reordering on RWCD

4cad43e

bereng marked this pull request as draft June 3, 2025 13:14

Added doublecheck on stop compactions

5e2eb5c

bereng commented Jun 4, 2025

View reviewed changes

Blocking task on it own single thread executor

aa8d622

blambov reviewed Jun 4, 2025

View reviewed changes

src/java/org/apache/cassandra/db/ColumnFamilyStore.java Outdated Show resolved Hide resolved

bereng added 3 commits June 4, 2025 14:19

Remove single thread executor

362a243

POC removing sync(this)

8deffb7

Added junits

9d49e50

hcd-130 diag patch #1743

Are you sure you want to change the base?

hcd-130 diag patch #1743

Uh oh!

Conversation

bereng commented May 20, 2025

What is the issue

What does this PR fix and why was it fixed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented May 20, 2025

Quality Gate passed

Uh oh!

cassci-bot commented May 20, 2025

❌ Build ds-cassandra-pr-gate/PR-1743 rejected by Butler

Found 10 new test failures

No known test failures found

Uh oh!

bereng commented May 21, 2025

Uh oh!

bereng commented May 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!