Skip to content

Conversation

@szymon-miezal
Copy link

What is the issue

UCS settings files are not dropped after the table gets dropped. Instead they are supposed to be cleared after the node restart. The cleanup is faulty though and it prevents the node from startup.

Root Cause:
The cleanupControllerConfig() method in CompactionManager attempts to verify if a table exists by calling getColumnFamilyStore(). When the table is dropped, this method throws IllegalArgumentException, which was not being caught. The existing catch block only handled NullPointerException (for missing keyspace).

What does this PR fix and why was it fixed

Extended the exception handler to catch both NullPointerException and IllegalArgumentException, allowing orphaned controller-config.JSON files to be properly identified and deleted during node restart.

UCS settings files are not dropped after the table gets dropped.
Instead they are supposed to be cleared after the node restart.
The cleanup is faulty though and it prevents the node from startup.

Root Cause:
The cleanupControllerConfig() method in CompactionManager attempts to
verify if a table exists by calling getColumnFamilyStore(). When the
table is dropped, this method throws IllegalArgumentException, which
was not being caught. The existing catch block only handled
NullPointerException (for missing keyspace).

Fix:
Extended the exception handler to catch both NullPointerException and
IllegalArgumentException, allowing orphaned controller-config.JSON
files to be properly identified and deleted during node restart.
@szymon-miezal szymon-miezal changed the title HCD-237: UCS settings json file cleanup fails (WIP) HCD-237: UCS settings json file cleanup fails Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@szymon-miezal szymon-miezal changed the title (WIP) HCD-237: UCS settings json file cleanup fails HCD-237: UCS settings json file cleanup fails Dec 3, 2025
@szymon-miezal
Copy link
Author

@bereng
Copy link
Collaborator

bereng commented Dec 3, 2025

There's sthg I am missing here. Dropping a table makes a node never start again? That makes no sense. What am I missing? What is the actual bug we're trying to fix?

@szymon-miezal
Copy link
Author

Dropping a table makes a node never start again?

To be precise that's dropping a table with UCS. The test showcases the steps required, the only unordinary step it does it saving UCS settings on demand rather than waiting for it to be periodically saved by the background thread (https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L180).

@bereng
Copy link
Collaborator

bereng commented Dec 5, 2025

Did you notice there are missing json files exceptions in the CNDB CI run?

@szymon-miezal
Copy link
Author

szymon-miezal commented Dec 5, 2025

Did you notice there are missing json files exceptions in the CNDB CI run?

Caused by: java.io.UncheckedIOException: java.nio.file.NoSuchFileException: /tmp/tenant1/compaction/74656e616e7431_foo/bar1-e28f23b0-d02e-11f0-82ab-b3c31bc6cce2/tasks/completed/bbe37b03-a1c6-45b3-8f95-37a797fd2077.json

looks like different sort of files.

To have more data, I have triggered CNDB CI again.

The clause ensures we get a meaningful error message, to maintain the
behaviour we rethrow the exception.
@szymon-miezal
Copy link
Author

My CNDB CI has been cancelled 😭, I will try again.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 9, 2025

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2145 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testZeroOrOneToManyCompaction[dc true] NEW 🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testMultiSegmentBuild[dc false] REGRESSION 🔴 0 / 19

Found 4 known test failures

@szymon-miezal
Copy link
Author

To satisfy the SonarQube I had to add the following test - bead4ff.
It doesn't provide much value, all it does is it goes through the code line where the logger it called and ensures the exception isn't swallowed.

@szymon-miezal szymon-miezal merged commit ef0ec1d into main Dec 9, 2025
406 of 423 checks passed
@szymon-miezal szymon-miezal deleted the HCD-237 branch December 9, 2025 13:58
szymon-miezal added a commit that referenced this pull request Dec 10, 2025
### What is the issue
UCS settings files are not dropped after the table gets dropped. Instead
they are supposed to be cleared after the node restart. The cleanup is
faulty though and it prevents the node from startup.

Root Cause:
The cleanupControllerConfig() method in CompactionManager attempts to
verify if a table exists by calling getColumnFamilyStore(). When the
table is dropped, this method throws IllegalArgumentException, which was
not being caught. The existing catch block only handled
NullPointerException (for missing keyspace).

### What does this PR fix and why was it fixed
Extended the exception handler to catch both NullPointerException and
IllegalArgumentException, allowing orphaned controller-config.JSON files
to be properly identified and deleted during node restart.
szymon-miezal added a commit that referenced this pull request Dec 11, 2025
UCS settings files are not dropped after the table gets dropped. Instead
they are supposed to be cleared after the node restart. The cleanup is
faulty though and it prevents the node from startup.

Root Cause:
The cleanupControllerConfig() method in CompactionManager attempts to
verify if a table exists by calling getColumnFamilyStore(). When the
table is dropped, this method throws IllegalArgumentException, which was
not being caught. The existing catch block only handled
NullPointerException (for missing keyspace).

Extended the exception handler to catch both NullPointerException and
IllegalArgumentException, allowing orphaned controller-config.JSON files
to be properly identified and deleted during node restart.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants