Skip to content

Conversation

@michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Oct 8, 2025

What is the issue

https://github.com/riptano/cndb/issues/15527

What does this PR fix and why was it fixed

Adds ability to build vector indexes using Fused PQ. Enabled by default. Can be disabled via cassandra.sai.vector.enable_fused config param. When enabled, fused graphs store the product quantization in the graph on disk, removing the need to store it in memory, and unlocking new scale for vector indexes.

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one


VectorCompression.CompressionType compressionType = VectorCompression.CompressionType.values()[reader.readByte()];
if (features.contains(FeatureId.FUSED_ADC))
if (features.contains(FeatureId.FUSED_PQ))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marianotepper - I just noticed that the features map already has logic that loads the ProductQuantization, meaning this branch currently keeps two identical maps in memory. I think it'd make sense to possibly expose the features map in the OnDiskGraphIndex so we can remove the duplicate cost. Any reason we can't do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we actually already use the one from the header, I just didn't catch it. I think we'll be able to get rid of it with a little extra work in CC.

@michaeljmarshall michaeljmarshall marked this pull request as ready for review November 25, 2025 22:23
michaeljmarshall and others added 4 commits December 1, 2025 14:01
built from the branch in this
PR: datastax/jvector#588
This is a hack because it increases memory
utilization and may impact other threads
in the CC host. However, this will help
unblock certain test scenarios.
Improve hashCode of the Key to prevent collisions
Implement Comparable<Key> in case the ConcurrentHashMap falls back in to Tree mode

(cherry picked from commit 091b636)
@eolivelli
Copy link

I have cherry-pick this into this feature branch: #2156

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 7, 2025

@michaeljmarshall michaeljmarshall changed the title CNDB-15527: Add config to use FusedADC CNDB-15527: Add config to use FusedPQ Dec 15, 2025
@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2042 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testZeroOrOneToManyCompaction[eb false] NEW 🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testSiftSmall[db false] REGRESSION 🔴 0 / 19

Found 2 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants