-
Notifications
You must be signed in to change notification settings - Fork 21
[CC4] CNDB-15558: ULID-based SSTable ID generation can fail with an NPE #2175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
|
running CI, and checking if there's a unit test to add here… |
cd8b732 to
fab82ec
Compare
fab82ec to
68d652e
Compare
jkni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Overall, LGTM and I appreciate the increased test coverage. I left a few minor nits inline. Can you run CNDB CI with a build of this PR?
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/io/sstable/ULIDBasedSSTableIdGeneratorTest.java
Outdated
Show resolved
Hide resolved
ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal. Top of the error stack: ``` java.lang.NullPointerException at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58) at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52) at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129) ``` This can cause a flush to fail. Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync. Tests are added in ULIDBasedSSTableIdGeneratorTest A package-protected constructor is introduced for ULIDBasedSSTableIdGeneratorTest.testGeneratorRetryOnEmptyOptional() Cassandra Applicability: upstream doesn't have ULIDBasedSSTableId (and won't because CASSANDRA-17048).
68d652e to
bd286de
Compare
|
cndb tests: https://github.com/riptano/cndb/pull/16340 |
|
❌ Build ds-cassandra-pr-gate/PR-2175 rejected by Butler3 regressions found Found 3 new test failures
Found 1 known test failures |
jkni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!



https://github.com/riptano/cndb/issues/15558
What is the issue
ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.
Top of the error stack:
This can cause a flush to fail.
What does this PR fix and why was it fixed
Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.