Skip to content

CNDB-13689: use NodeQueue::pushMany to decrease time complexity to build heap #1693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 22, 2025

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Apr 11, 2025

What is the issue

Fixes: https://github.com/riptano/cndb/issues/13689

What does this PR fix and why was it fixed

This PR utilizes the NodeQueue::pushMany method to decrease the time complexity required to build the NodeQueue from O(n log(n)) to O(n). This is likely only significant for sufficiently large hybrid queries. For example, we have seen cases of the search producing 400k rows, which means that we do 400k insertions into these NodeQueue objects.

@michaeljmarshall michaeljmarshall self-assigned this Apr 11, 2025
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaeljmarshall
Copy link
Member Author

Looks like I caught a bug in the initial implementation. Fix proposed: datastax/jvector#433.

@michaeljmarshall michaeljmarshall marked this pull request as draft April 11, 2025 21:05
@michaeljmarshall michaeljmarshall changed the title CNDB-13689: SAI: use NodeQueue::pushAll to decrease time complexity to build heap CNDB-13689: use NodeQueue::pushMany to decrease time complexity to build heap Apr 18, 2025
@michaeljmarshall michaeljmarshall marked this pull request as ready for review April 18, 2025 15:34
Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1693 rejected by Butler


2 new test failure(s) in 2 builds
See build details here


Found 2 new test failures

Test Explanation Branch history Upstream history
...gLegacyIndex.test_sstableloader_with_failing_2i regression 🔴🔴 🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔴 🔵🔵🔵🔵🔵🔵🔵

Found 3 known test failures

@michaeljmarshall michaeljmarshall merged commit df81d9a into main Apr 22, 2025
465 of 476 checks passed
@michaeljmarshall michaeljmarshall deleted the cndb-13689 branch April 22, 2025 16:17
djatnieks pushed a commit that referenced this pull request May 29, 2025
…ild heap (#1693)

### What is the issue
Fixes: riptano/cndb#13689

### What does this PR fix and why was it fixed
This PR utilizes the NodeQueue::pushMany method to decrease the time
complexity required to build the NodeQueue from `O(n log(n))` to `O(n)`.
This is likely only significant for sufficiently large hybrid queries.
For example, we have seen cases of the search producing 400k rows, which
means that we do 400k insertions into these NodeQueue objects.
djatnieks pushed a commit that referenced this pull request May 29, 2025
…ild heap (#1693)

### What is the issue
Fixes: riptano/cndb#13689

### What does this PR fix and why was it fixed
This PR utilizes the NodeQueue::pushMany method to decrease the time
complexity required to build the NodeQueue from `O(n log(n))` to `O(n)`.
This is likely only significant for sufficiently large hybrid queries.
For example, we have seen cases of the search producing 400k rows, which
means that we do 400k insertions into these NodeQueue objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants