Skip to content

hcd-130 diag patch #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: hcd-1.1.1-rel
Choose a base branch
from
Draft

hcd-130 diag patch #1743

wants to merge 11 commits into from

Conversation

bereng
Copy link
Collaborator

@bereng bereng commented May 20, 2025

What is the issue

...

What does this PR fix and why was it fixed

...

@@ -1581,6 +1582,69 @@ public void testPending()
assertEquals(8, aggregate.getPending().size());
}

@Test
public void applyMaxParallelism()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore me

Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1743 rejected by Butler


10 new test failure(s) in 1 builds
See build details here


Found 10 new test failures

Test Explanation Branch history Upstream history
t.TestCqlshUnicode.test_unicode_desc regression 🔴
t.TestCqlshUnicode.test_unicode_identifier regression 🔴
...gLegacyIndex.test_sstableloader_with_failing_2i regression 🔴
...tCorruptedSSTablesWithLeveledCompactionStrategy regression 🔴
o.a.c.d.t.s.TraceTest.testMultiIndexTracing regression 🔴
...t.testKDTreePostingsQueryMetricsWithSingleIndex regression 🔴
...Test.testFinalOpenRetainsCachedData[format=BIG] regression 🔴
...Test.testFinalOpenRetainsCachedData[format=BTI] regression 🔴
...m.TrieMemtableMetricsTest.testContentionMetrics regression 🔴
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴

No known test failures found

@bereng
Copy link
Collaborator Author

bereng commented May 21, 2025

The only sstable Ci failures also repro on the base branch

@bereng bereng closed this May 21, 2025
@bereng bereng reopened this May 21, 2025
@bereng bereng marked this pull request as draft May 21, 2025 06:36
@bereng bereng marked this pull request as ready for review May 29, 2025 11:49
@bereng
Copy link
Collaborator Author

bereng commented May 29, 2025

This is not ready for review but otherwise CI won't trigger

@bereng bereng marked this pull request as draft June 3, 2025 13:14
@@ -2792,7 +2795,8 @@ public <V> V runWithCompactionsDisabled(Callable<V> callable,
{
// synchronize so that concurrent invocations don't re-enable compactions partway through unexpectedly,
// and so we only run one major compaction at a time
synchronized (this)
rwcdLock.lock();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blambov let me pick your brain here. We're getting deadlocks here when syncing on CF and later on the strategy when pausing or resuming compactions. Then a UCS task locks on the strategy first an later on CF. Trying to reorder locking produces all sorts of new deadlocks elsewhere in the code.

Given runWithCompactionsDisabled syncs on CF to avoid concurrent calls iiuc, do you see any problems with this approach?

  • Use an aux lock to avoid concurrent calls
  • Reorder the sync against CF to come later avoiding nested and badly ordered syncs

I am just wary the original sync against CF was guarding against sthg else I might be missing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach does sound sensible to me. This is a very long process to be locking the CFS.

I took a look at the usages of synchronization in CFS and they fall in four categories:

  • locking data for memtable changes (flush, drop, etc.),
  • locking this for recalculating state (local data ranges),
  • locking this to apply long-running tasks that forbid compaction,
  • locking the static class object to create new CFS instances.

The second and third categories are incompatible and cause this deadlock. One of categories needs to use a different lock. Maybe call it longRunningSerializedOperationsLock to also include any further uses of the kind.

Or maybe use a separate single-threaded executor? Now that this comes to mind, what thread executes these operations, and can we cause a deadlock by holding that thread while waiting for something to complete on it? This could also happen if the running thread isn't single-threaded, if we manage to issue enough requests to park all of them... Sounds like it is safest to use a dedicated executor to solve the serialization problem for the compactions-disabled tasks and anything similar we may need in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I followed that single thread executor approach you mention? I pushed a version where the actual cfs sync and callable all happen in their own single thread. That should isolate that thread from heavy parking the previous executor pool whichever that was. Is that what you meant? I feel like I'm missing sthg here bc that has nothing to do with the serialization.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the whole thing. I now realize this is a synchronous method that returns a value so we can't really gain anything by putting it in a thread as we still have to block to get the value.

// doublecheck that we finished, instead of timing out
for (ColumnFamilyStore cfs : toInterruptFor)
return Executors.newSingleThreadExecutor().submit(() -> {
synchronized (this)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? And if we do need to synchronize here, should it not be on data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Given many nodetool commands call into here shouldn't we block on the whole CF instead of only data? We don't know the nature of all commands, customer extensions or future commands. Syncing on whole CF seems best. Wdyt?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think synchronized (this) stops compactions initiating work (which is what this seems to want to achieve) as it does not prevent changes to the sstable set (synchronized (data) does the latter). And it does leave us open to the same kind of deadlock. It may be even worse if we lock data, unless we are certain none of the callables can perform work in another thread.

I personally think it would be safest not to do any locking here. If a callable needs it, it should lock data to stop concurrent sstable set changes. If necessary, we can provide a method that also wraps the callable in a synchronized (data) and use it for calls we know are just selecting sstables (e.g. not for truncateBlocking which does a lot of work inside that call).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, even synchronized(data) does not prevent changes to the sstable set. Let's get rid of this lock completely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I was pulled into a meeting.

Removing synchronized (this) can make calls to nodetool commands and invalidateLocalRangesAndDiskBoundaries(), importNewSSTables(), getLocalRanges() and invalidateLocalRangesAndDiskBoundaries() race i.e.

But iiuc you're saying we don't need to lock at this level bc the callable will just be consuming the current CF API which is already thread-safe. If any callable was to break that it is it's responsibility to sync when needed. The serialization synchronized (this) was trying to provide it is already provided now by longRunningSerializedOperationsLock.

Is this what you're saying? I am still wary nodetool commands needing to sync on this. But we can forward your proposal to the customer, if all goes good port it to OSS where more eyes will see it and then merge it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we started with synchronization on the whole method, done (according to the comment) to prevent other operations of this kind racing, and we replaced that with a different lock. The extra synchronization on this is thus not necessary as other e.g. nodetool commands cannot run in parallel because of the new lock.

The thing synchronized (this) actually blocks is recalculation of token ranges by other threads, which is not something we need to block, and which could still lead to a deadlock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI looks good Let's forward this latest diag to the customer and hope it's the last bug we hit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants