forked from apache/cassandra
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNDB-12290 CC5 fix to use TrieMemtable as default in MemtableParams #1481
Open
djatnieks
wants to merge
1,289
commits into
main-5.0
Choose a base branch
from
CNDB-12290
base: main-5.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…t queries (#1245) Fixes a race condition when a filter-then-sort query gets intertwined with a deletion (or upsert) and a flush, only with DC/v5 format. The fix is simply considering that the postings might be empty after reading a vector.
… in CassandraRelevantProperties
…iptor.applyAll before createAllDirectories
…a.util to be able to find the expected Striped.custom method.
…fault for the initial build task and always mark the index queryable in a callback; in CC 5.0, directly calling markIndexBuilt leads to deadlock through a nested load request to LoadingMap.blockingLoadIfAbsent.
Update expected output to match CC change to limit expr strings to 9 characters in RowFilter.SimpleExpression.toString from commit "Use vector index node count estimates when planning TopK queries (#1164)". With this change, the result of toCqlString(query) may not be valid CQL due to truncation.
…ace with DatabaseDescriptor.createAllDirectories. Fixes flaky test failures observed in AbstractReadQueryToCQLStringTest and ClearSnapshotTest.
When pre-filtering, sorting happens on already filtered rows, so the topK limit must not be divided by the selectivity. Fixes riptano/cndb#10542
This gets us to the IndexNotAvailableException
This commit fixes a minor bug in SAI cost estimation code that caused the cost of intersections to be sometimes overestimated. The estimation code didn't account for the fact that we may reach the end of the iterator before finding a matching key.
* update ANN costs and scale disk accesses by cache hit rate * Integrate brute-force estimation with Plan * ANN costs better match the query execution * include rawExpectedNodesVisited in trace when surprised * LinearFit handles duplicate X values instead of returning NaN * split sort into ANN_SORT_OPEN_COST and ANN_SORT_KEY_COST * incorporate rerankK in estimation * remove DISK_ACCESS_COST, I don't think it accurately models the difference between an sai search (multiple random-access requests to disk) and fetching a neighbor list (single request) * update statistics on the unrestricted search path * use actual degree in cost estimate instead of hardcoded default * delete SelectiveIntersectionTest, it no longer adds value wrt PlanTest, and high SAI_OPEN_COST means that it's not actually correct to use index scans on the tiny table involved * rename FilterSortOrder constants
CNDB-9441: fix SAI rebuild order to complete transaction first
* remove CassandraOnDiskHnsw * move CloseableReranker to upper level * remove JVectorLuceneOnDiskGraph and VectorSupplier * relocated DiskBinarySearch and remove the rest of the sai.disk.v2.hnsw package * containsUnitVectors is no longer Optional
Includes SortedIterator and TopKSelector, as well as a comparator-accepting LucenePriorityQueue.
* Vector: Release GraphSearcherAccessManager on exception * Use injection to force test failure
…oup of SAI expressions (#1246) NEQ is unhandled in this switch, leading to a NPE under certain query conditions where an Expression is created and no operation is set for this expression.
…ng index build (#1258) - use writeTimeReadFileChannelFor() for recalculateChecksum on appending index segment
Introduce a new SAI index option, equals_behaviour_when_analyzed, to decide the behaviour of the equals (=) operator on columns with an analyzed index. Possible values are: * MATCH: The = operator will behave the same as the : operator. A client warning will be emitted on SELECT recommending the user to use the : operator instead. This is the default value for backward compatibility. * UNSUPPORTED: The = operator will be rejected on analyzed columns. This will be the default in the future.
Unbounded queue length at the native transport stage can caused large backlogs of requests that the system processes, even though clients may no longer expect a response. This PR implements a limited backport of CNDB-11070, introducing the notion of a native request timeout that can shed messages with excessive queue times at the NTR stage as well as async read/write stages, if enabled. Cross-node message timeouts are also now respected earlier in the mutation verb handler. This is a fairly straightforward cherry-pick of #1393 targeting main instead of cc-main-migration-release.
This reverts commit c06c94c. It seems the removal of `Index#postProcessor` by CNDB-11762 broke some tests in CNDB's `MultiNodeBillingTest`. Unfortunately that patch [was merged before creating the PR bumping the CC version used by CNDB](#1422 (comment)). [The CNDB PR](riptano/cndb#12076) was created after that merging but it was superseded by other CC version bumps. So I'm adding this reversal so we can investigate how the removal of `Index#postProcessor` affects those tests.
This patch replaces null values of `deterministic`, `monotonic` and `monotonic_on` columns in `system_schema.functions` and `system_schema.aggregates` with negative defaults. These defaults will be addressed if/once DB-672 gets ported to CC.
There are two mechanisms of detecting that the cluster is in the upgrade state and the minimum version. Both are slightly different, and both are not pluggable which means that CNDB doesn't work properly with them. Those mechanisms are implemented in `Gossiper`. Although we do not use `Gossiper` in CNDB, there are classes like `ColumnFilter` which go to `Gossiper` to check the upgrade state. So far, using that stuff in CDNB was a bit unpredictable, some of them reported the cluster is upgraded and in the current version, the other did not. This turned out to be a problem, especially for the `ColumnFilter` because when we upgrade DSE --> CC, CC assumes that the newest filter version should be used, which is not correctly deserialized and interpreted by DSE. The fix is not small, but it probably simplifies stuff a bit. First of all, two mechanism are merged into one. Moreover, we added pluggability of it so that we can provide the appropriate implementation in CNDB coordinators and writers, which is based on ETCD.
Part of riptano/cndb#12139 Moves constant shard count outside looping shards to reduce confusion.
…with DurationSpec type and 'native_transport_timeout_in_ms' as convertible old name with Long type; add some tests.
…MemtableIndexTest and TrieMemtableIndexAllocationsHeapBuffersTest from main branch.
…strictions (#1449) Closes riptano/cndb#12139 This PR adds a test of row count of a SAI plan in the presence of restrictions. Currently it tests queries with inequality, equality and half-ranges on different SAI column types and with or without histograms.
…g VIntOutOfRangeException to the catch block of SSTableIdentityIterator.exhaust method.
…pactionProgress to return the operation type from the first entry in the shared progress; needed in cases that a CompactionTask type is changed after creation.
…opriate (#1469) Fixes riptano/cndb#12239 We found the System.nanoTime was using significant cpu cost, but because the timeout is high enough, we can accept the inaccuracy. - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
Fixes regression in jvector 3.0.4 when compacting PQVectors larger than 2GB
### What is the issue SimpleClientPerfTest has been failing in CI since changes from CNDB-10759 ### What does this PR fix and why was it fixed This change in `SimpleClientPerfTest`, updates the anonymous class `Message.Codec<QueryMessage>` to override the correct method, `public CompletableFuture<Response> maybeExecuteAsync` from `QueryMessage`, whose signature was changed as part of CNDB-10759. ### Checklist before you submit for review - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
…ing for async batchlog removal (#1485) The test asserts that the batchlog is removed immediately after the write completes, but removal of the batchlog is async and can be delayed, particularly in resource-limited environments like CI.
The test generates partition index accesses by reusing the same key, and if the key cache is enabled, the test will fail for bigtable profiles because the key will be in the key cache.
…by filtering queries (#1484) Queries creating fake index contexts each create their own context, which can then race on metric registration (as the metrics have the same patterns). This can cause a query to fail. These metrics are superfluous, we can skip creating them entirely.
… index format version 'dx', 'cx', or older.
…ate.accumulatedDataSize; it only worked to fix SensorsWriteTest.testMultipleRowsMutationWithClusteringKey for SkipListMemtable and may not be necessary if the default memtable becomes TrieMemtable. Revisit SensorsWriteTest later if necessary.
Move static class TrieMemtable.Factory to TrieMemtableFactory class; Use suggested TriePartitionUpdate.unsharedHeapSize implementation; Use InMemoryTrie.shortLived in TrieToDotTest and TrieToMermaidTest; Add specific versions aa, ca, and da to RowIndexTest;
Add addMemoryUsageTo in SkipListMemtable and TrieMemtable Add TrieMemtable.switchOut
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the issue
Update CC 5.0 to use TrieMemtable by default in MemtableParams to align with main branch.
What does this PR fix and why was it fixed
Updates MemtableParams to use DefaultMemtableFactory, which uses TrieMemtable, instead of using SkipTableMemtableFactory.
Checklist before you submit for review
NoSpamLogger
for log lines that may appear frequently in the logs