Skip to content

CNDB-14577: Compact all SSTables of a level shard if their number reaches a limit #1873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cbornet
Copy link

@cbornet cbornet commented Jul 15, 2025

What is the issue

CNDB-14577: UCS by default does not compact many small non-overlapping sstables with very few rows

What does this PR fix and why was it fixed

This PR limits the number of SSTable for a given compaction level by executing a major compaction of the level instead of the regular compaction of overlapping SSTables.

TODO:

  • add docs

Copy link

github-actions bot commented Jul 15, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@cbornet cbornet marked this pull request as draft July 15, 2025 14:55
@cbornet cbornet force-pushed the compact-too-many-sst branch 2 times, most recently from b12b3a2 to e85cb1d Compare July 18, 2025 12:28
Copy link

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of comments, and I need to take a fresh look at how the parallelism is controlled on Monday to give you some ideas for that.

@cbornet cbornet changed the title CNDB-14577: Compact all SSTables of a level if their number reaches a… CNDB-14577: Compact all SSTables of a level if their number reaches a limit Jul 18, 2025
@blambov
Copy link

blambov commented Jul 21, 2025

AFAICS there are no issues with the parallelism, it should be properly handled by getSelection.

@cbornet cbornet force-pushed the compact-too-many-sst branch from ef8f14c to d1e5463 Compare July 23, 2025 12:15
// If that's the case, we perform a major compaction on those shards.
List<Set<CompactionSSTable>> groups =
shardManager.splitSSTablesInShards(sstables,
new ShardingStats(sstables, shardManager, controller).shardCountForDensity,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This ShardingStats collection is rather costly. Since we already have a density range for the level, let's use a simpler controller.getNumShards(max).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(sstableShard, shardRange) -> Sets.newHashSet(sstableShard));

List<Set<CompactionSSTable>> oversizeGroups =
groups.stream()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not a fan of the lack of indentation here. Could we move these a few spaces to the right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from the style guide that was set by ant generate-idea-files:

<option name="CONTINUATION_INDENT_SIZE" value="0" />

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is there and makes a lot of sense for the cases of

x ->
{
   ...
}

or

functionCall
(
   ...
)

but

x =
something

is breaking readability. I'm afraid we may have to adjust these manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can't be configured separately in IDEA ?
I'll have a look.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


List<Set<CompactionSSTable>> oversizeGroups =
groups.stream()
.filter(group -> group.size() > threshold * controller.getShardMaxSstablesFactor())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes better sense to use the fan factor instead of the threshold (the threshold is always 2 for levelled compaction, and it feels wrong/surprising for e.g. T4 and L4 to have different triggers).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

for (Bucket bucket : buckets)
aggregates.add(bucket.constructAggregate(controller, spaceAvailable, arena));
}
else
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an if (sstables.size() > limit) to skip quickly if we don't have enough sstables for any shard to be over the limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -314,6 +314,10 @@ public abstract class Controller
@Deprecated
static final String STATIC_SCALING_FACTORS_OPTION = "static_scaling_factors";

static final String SHARD_MAX_SSTABLES_FACTOR_OPTION = "shard_max_sstables_factor";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name isn't very readable to me.

Maybe max_sstables_per_shard_factor or sstables_per_shard_max_factor?

Or non_overlapping_threshold/trigger?

Note that we also need to document this in UnifiedCompactionStrategy.md -- both as a section that describes the behaviour as well as a paragraph in the configuration section. If we can't find a default that works well in almost all situations, also make sure it is eventually added to the product documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

for (Set<CompactionSSTable> group : groups)
{
if (group.stream().anyMatch(
sstable -> oversizeGroups.stream().anyMatch(oversizeGroup -> oversizeGroup.contains(sstable))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we can't use streams in OSS C*, where this will also need to be ported. Perhaps turn these into loops already here (IntelliJ can do this for you) so that the code does not immediately diverge?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a performance concern ?
Because the min JDK seems to be 11 even for OSS C*

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance. Apparently it is not yet officially forbidden, but it will be: https://lists.apache.org/thread/65glsjzkmpktzmns6j9wvr4nczvskx36

I'm also afraid this can fall under the current "don't use streams on hot paths" recommendation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@cbornet cbornet force-pushed the compact-too-many-sst branch from d1e5463 to 7a6d9b7 Compare July 24, 2025 09:51
@cbornet cbornet marked this pull request as ready for review July 24, 2025 09:52
@cbornet cbornet force-pushed the compact-too-many-sst branch from 7a6d9b7 to bfc79cc Compare July 24, 2025 09:55
@cbornet cbornet changed the title CNDB-14577: Compact all SSTables of a level if their number reaches a limit CNDB-14577: Compact all SSTables of a shard level if their number reaches a limit Jul 24, 2025
@cbornet cbornet changed the title CNDB-14577: Compact all SSTables of a shard level if their number reaches a limit CNDB-14577: Compact all SSTables of a level shard if their number reaches a limit Jul 24, 2025
Copy link

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, and we can proceed to make an image from it.

If you think you won't have the time to do the documentation additions, let me know and I will prepare a doc commit.

Collections.emptyList(),
arena,
this));
oversizeGroups.add(compactionSSTables);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining why you prefer to save the groups and check them individually instead of adding all to a single set?
(So that someone reading it doesn't immediately want to do an optimization that may be counterproductive.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this optimization would be good, wouldn't it ?
I think I should do it...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@eolivelli
Copy link

how do we disable this feature ?
like setting max_sstables_per_shard_factor to a very big number ?

@cbornet
Copy link
Author

cbornet commented Jul 24, 2025

how do we disable this feature ?
like setting max_sstables_per_shard_factor to a very big number ?

Yes that would be the way. We could also implement a magic value like max and map it to 0 or -1 (can't use Double.MAX_VALUE as it's used in a multiplication).

@blambov
Copy link

blambov commented Jul 24, 2025

Double.POSITIVE_INFINITY should be fine. If we set the flag to "1e1000", we should get infinity.

jshell> Double.parseDouble("1e1000")
$4 ==> Infinity

@cbornet
Copy link
Author

cbornet commented Jul 24, 2025

Double.POSITIVE_INFINITY should be fine. If we set the flag to "1e1000", we should get infinity.

Oh right, I had fogotten about that. Means that we can also map to Double.MAX_VALUE and multiply after all.

@cbornet cbornet force-pushed the compact-too-many-sst branch from 8295d68 to 52fa379 Compare July 25, 2025 09:36
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1873 rejected by Butler


1 new test failure(s) in 4 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
...nericOrderByTest.testOrderingAcrossManySstables regression 🔴🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 2 known test failures

Controller.validateOptions(options);
controller = Controller.fromOptions(cfs, options);
assertEquals(Controller.DEFAULT_MAX_SSTABLES_PER_SHARD_FACTOR * 10, controller.getMaxSstablesPerShardFactor(), epsilon);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd also add a test with a value that maps to infinity (e.g. 1e1000) just to make sure we can use that option.

Also consider a copy of the test where we set the factor to infinity and we end up with no compactions selected.

else
{
// If there are no overlaps, we look if some shards have too many SSTables.
// If that's the case, we perform a major compaction on those shards.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe add a "see CNDB-14577" as a further pointer to the reason we are doing these.

It also makes sense to extract this part in a method (I was surprised sonar isn't complaining about this, but it seems it isn't running).

@cbornet
Copy link
Author

cbornet commented Jul 25, 2025

I did a test with a lot of 1-key SSTables created with this script

import os
import cassio
import uuid
import threading
import subprocess
import datetime
from cassio.config import check_resolve_session

os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "xxxx"
os.environ["ASTRA_DB_SECURE_BUNDLE_PATH"] = "secure-connect-test-1873.zip"

cassio.init(auto=True)
session = check_resolve_session()

session.execute("CREATE TABLE IF NOT EXISTS default_keyspace.test_table2 (row_id text, PRIMARY KEY(row_id));")

command="curl -s  -XPOST -d '{}' http://localhost:12346/writer/api/v0/nodetool/flush/tables"

def flush(num):
    cmd = f"kubectl exec pod/cndb-writerpool-dedicated-c0466dc0-rack{num}-v0-0 -n cndb-system -- bash -c \"{command}\""
    print(cmd)
    res = subprocess.run(cmd, shell=True, capture_output=True, check=True)
    print(res)


for i in range(10000):
    print(datetime.datetime.now())
    id_ = uuid.uuid4()
    session.execute(f"INSERT INTO default_keyspace.test_table2 (row_id) VALUES ('{id_}')")

    t1 = threading.Thread(target=flush, args=(0,))
    t2 = threading.Thread(target=flush, args=(1,))
    t3 = threading.Thread(target=flush, args=(2,))

    t1.start()
    t2.start()
    t3.start()

    t1.join()
    t2.join()
    t3.join()

With the current CNDB, the SSTables are never compacted and pile up:
Before

After applying the fix, the SSTabled are regularly compacted when the number reaches about 70:
Screenshot 2025-07-25 at 18 19 14
Minutes later, the number is still limited.
Screenshot 2025-07-25 at 18 29 35

@cbornet
Copy link
Author

cbornet commented Jul 25, 2025

If we set a very high threshold (aka disabling the feature), eg with -Dunified_compaction.max_sstables_per_shard_factor=1e1000, SSTables start to pile up again:
Screenshot 2025-07-25 at 19 11 29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants