Skip to content

Conversation

@michaelsembwever
Copy link
Member

@michaelsembwever michaelsembwever commented Dec 15, 2025

https://github.com/riptano/cndb/issues/15558

What is the issue

ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.

Top of the error stack:

java.lang.NullPointerException
	at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129)

This can cause a flush to fail.

What does this PR fix and why was it fixed

Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.

@github-actions
Copy link

github-actions bot commented Dec 15, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaelsembwever
Copy link
Member Author

running CI, and checking if there's a unit test to add here…

@michaelsembwever michaelsembwever changed the title CNDB-15558: ULID-based SSTable ID generation can fail with an NPE [CC4] CNDB-15558: ULID-based SSTable ID generation can fail with an NPE Dec 16, 2025
Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Overall, LGTM and I appreciate the increased test coverage. I left a few minor nits inline. Can you run CNDB CI with a build of this PR?

ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.

Top of the error stack:
```
java.lang.NullPointerException
	at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129)
```
This can cause a flush to fail.

Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.

Tests are added in ULIDBasedSSTableIdGeneratorTest
A package-protected constructor is introduced for ULIDBasedSSTableIdGeneratorTest.testGeneratorRetryOnEmptyOptional()

Cassandra Applicability:
 upstream doesn't have ULIDBasedSSTableId (and won't because CASSANDRA-17048).
@michaelsembwever
Copy link
Member Author

cndb tests: https://github.com/riptano/cndb/pull/16340

@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2175 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
o.a.c.db.commitlog.BatchCommitLogTest.testBatchCLSyncImmediately[12] (compression) REGRESSION 🔴🔵 0 / 19
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompaction[dc false] NEW 🔴🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testCompaction[ca false] (compression) REGRESSION 🔴🔵 0 / 19

Found 1 known test failures

@jkni jkni self-requested a review December 18, 2025 23:22
Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@michaelsembwever michaelsembwever merged commit b110964 into main Dec 19, 2025
474 of 498 checks passed
@michaelsembwever michaelsembwever deleted the mck-cndb-15558-main branch December 19, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants