Skip to content

Conversation

@michaelsembwever
Copy link
Member

https://github.com/riptano/cndb/issues/15558

What is the issue

ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.

Top of the error stack:

java.lang.NullPointerException
	at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129)

This can cause a flush to fail.

What does this PR fix and why was it fixed

Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.

@github-actions
Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaelsembwever michaelsembwever self-assigned this Dec 17, 2025
ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.

Top of the error stack:
```
java.lang.NullPointerException
	at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129)
```
This can cause a flush to fail.

Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.

Tests are added in ULIDBasedSSTableIdGeneratorTest
A package-protected constructor is introduced for ULIDBasedSSTableIdGeneratorTest.testGeneratorRetryOnEmptyOptional()

Cassandra Applicability:
 upstream doesn't have ULIDBasedSSTableId (and won't because CASSANDRA-17048).
Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you re-trigger CI and I'll come back and approve once that looks good?

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2178 rejected by Butler


13 regressions found
See build details here


Found 13 new test failures

Test Explanation Runs Upstream
o.a.c.cql3.validation.operations.AggregationQueriesTest.testAggregationQueryShouldNotTimeoutWhenItExceedesReadTimeout (compression) NEW 🔴 2 / 28
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCC50 NEW 🔴 24 / 28
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCassandra41 NEW 🔴 3 / 28
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCassandra5 NEW 🔴 3 / 28
o.a.c.distributed.test.repair.ForceRepairTest.terminated successfully () NEW 🔴 1 / 28
o.a.c.index.SecondaryIndexManagerTest.testIndexRebuildWhenAddingSStableViaRemoteReload (compression) NEW 🔴 20 / 28
o.a.c.index.sai.cql.VectorCompaction100dTest.testZeroOrOneToManyCompaction[db true] () NEW 🔴 0 / 28
o.a.c.index.sai.cql.VectorKeyRestrictedOnPartitionTest.partitionRestrictedWidePartitionBqCompressedTest[ed false] (compression) NEW 🔴 0 / 28
o.a.c.index.sai.cql.VectorLocalTest.rangeRestrictedTest[dc false] (compression) NEW 🔴 0 / 28
o.a.c.metrics.TrieMemtableMetricsTest.testContentionMetrics (compression) NEW 🔴 9 / 28
o.a.c.repair.FailedAckTest.failedAck () NEW 🔴 1 / 28
o.a.c.repair.FailingRepairFuzzTest.failingRepair () NEW 🔴 1 / 28
o.a.c.repair.SlowMessageFuzzTest.slowMessages (compression) NEW 🔴 0 / 28

No known test failures found

@jkni jkni self-requested a review December 19, 2025 15:48
Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR!

@michaelsembwever michaelsembwever merged commit e9afd3a into main-5.0 Dec 19, 2025
573 of 594 checks passed
@michaelsembwever michaelsembwever deleted the mck-cndb-15558-main-5.0 branch December 19, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants