Skip to content

Conversation

@AlSchlo
Copy link
Contributor

@AlSchlo AlSchlo commented Oct 20, 2025

This PR introduces Panorama into HNSWFlat, following our paper. Panorama achieves up to 4× lower latency on higher-dimensional data, making it a great option for medium-sized datasets that don't benefit much from quantization.

Below are some benchmarks on SIFT-128, GIST-960, and synthetic 2048-dimensional data. I recommend checking out the paper for more results. As expected, Panorama is not a silver bullet when combined with HNSW—it’s only worthwhile for high-dimensional data.

It might be worth considering, in the future, adding a function that dynamically sets the number of levels. However, this would require reorganizing the cumulative sums.

SIFT-128

Note: SIFT-128 performs slightly worse here than in our paper because we use 8 levels, whereas the paper explored several level configurations. Eight levels introduce quite a bit of overhead for 128-dimensional data, but I kept it consistent across all benchmarks for comparison.

bench_hnsw_flat_panorama_SIFT1M

GIST-960

bench_hnsw_flat_panorama_GIST1M

Synthetic-2048

bench_hnsw_flat_panorama_Synthetic2048D

@meta-cla meta-cla bot added the CLA Signed label Oct 20, 2025
Copy link
Contributor

@mdouze mdouze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
About Panorama in general: would it be feasible to make an IndexRefine that supports FlatPanorma as a refinement index?
The reason is because it may be more efficient to do all the non-exhaustive searches in low dimension and refine the result list in the end.
This would also make it possible to apply panorama to low-accuracy & fast indexes like FastScan and RabitQ indexes.

for (int j = start_idx; j < end_idx; j++) {
sum += x[j] * x[j];
}
dst_cum_sums[level] = sqrt(sum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to do a sqrt (ie. use L2 distance instead of squared L2 distance as usual)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image From Cauchy-Schwarz, we precompute the `sqrt` so it does not happen in the hot path. A multiplication is always going to be cheaper.

* in a random order, which makes cache misses dominate the distance computation
* time.
*
* The num_levels parameter controls the granularity of progressive distance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to call it something else than num_levels? HNSW has its notion of levels -- the number of hierachy levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, perhaps num_panorama_levels?

* the algorithm to prune unpromising candidates early using Cauchy-Schwarz
* bounds on partial inner products. Hence, recall is not guaranteed to be the
* same as vanilla HNSW due to the heterogeneous precision within the search
* beam (exact vs. partial distance estimates affecting traversal order).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the batch size on which the distances are computed is at most the out-degree of the HNSW graph (set to 64 by default)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely.

@mdouze
Copy link
Contributor

mdouze commented Oct 20, 2025

Please share any performance comparison you have with this code vs. the HNSWFlat implementation.
Since the data is not contiguous, the performance profile could be different from an IVFFlat index.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 20, 2025

@mdouze thanks for the review

  1. Yes we can use it in IndexRefine, this is a good idea. I would assume that IndexRefine does not have its vector sequential in memory by design? If no, this is OK but sub-optimal, as the gains of Panorama are more modest in the presence of all those cache misses. We cover this in the paper.

  2. Performance of HNSW is benched in the paper too, it's still worth it on higher dimensional data, but much more ad-hoc than IndexIVFFlatPanorama. We will include some benches with this new cleaned up code. Here is the graph from the paper.

image
  1. Panorama can work on IVFPQ (including FastScan), but integration here is a bigger effort (to support all AVX targets, etc.) as we have to interleave the codes to keep the SIMD lanes busy. In fact, this is where we have the best performance speedups.

@alexanderguzhva
Copy link
Contributor

@AlSchlo is it worth allowing configuring a default (UB + LB) / 2 behavior by allowing, say, other options like just LB?

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 20, 2025

@alexanderguzhva excellent suggestion! so we actually used to have an epsilon knob there, but we ended up not talking about it in the paper. It's a knob that just adds confusion IMO and makes the workload more unpredictable.

We did not study it in more detail as the paper was getting too dense.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 21, 2025

Will write tests sometime this week. I also realize I need to change the write / read functions.
This slipped through the cracks for IVFFlatPanorama somehow.

@AlSchlo AlSchlo marked this pull request as ready for review October 22, 2025 08:26
@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 22, 2025

Pending benches, I believe this implementation to be pretty much complete.
Next line of work will be (1) fixing the nits in IVFFlat, there are currently 2 PRs open from @aknayar and (2) implement IndexRefinePanorama as @mdouze suggested.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 24, 2025

@mdouze The PR is done — could you please re-review?

Let's also try to get #4628 merged. It's a very useful metric to have, as there's a strong correlation between these stats and empirical performance.

Here's a nice excerpt from the paper that summarizes this:

image

Thanks!

@AlSchlo AlSchlo requested a review from mdouze October 24, 2025 02:57
@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 24, 2025

Once this is done, I will focus on getting the IndexRefinePanorama PR in.

@mnorris11
Copy link

Hi @AlSchlo and @aknayar , much thanks for the contributions! The stats PR has been merged.

After discussing internally, it sounds like the priority is having the IndexRefinePanorama. For our (my) learning, after this is in place, is there still a need for various other indexes like IndexHNSWFlatPanorama? Or can it then be applied to all indexes?

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 28, 2025

Hi @mnorris11,

IndexRefinePanorama would be a great fit when Panorama cannot be directly applied in the initial search space — for instance, when the dimensionality is small, or when integration into the main index has not yet been done. As a general rule, Panorama can be integrated almost anywhere with some engineering effort.

In our paper, we integrate Panorama into IVFPQ, and it performs well even for low-dimensional data, thanks to a SIMD optimization technique called byte-slicing, which helps keep vector lanes fully utilized. However, that implementation requires custom SIMD kernels, which makes it less portable.

This PR instead targets the common scenario where the dimensionality is large and no downstream index is used to refine results after HNSW. That setup is quite typical in practice — for instance, it would benefit us internally at Databricks.

So, my take would be to include both implementations: they address different use cases. Longer-term, the goal is to adapt Panorama into quantization-based techniques like RaBitQ, so we can accelerate both the initial search and refinement phases. This is our current research direction.

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 28, 2025

TL;DR @mnorris11

IndexRefinePanorama integrates Panorama into the refinement phase (which might be needed if the upstream index yields poor recall).

IndexHNSWPanorama integrates Panorama into the search phase of HNSW.

@alexanderguzhva
Copy link
Contributor

@AlSchlo the problem with rabitq that I see is that the overhead of storing additional coefficient is going to be significant, unlike 32-bit or even 16-bit floats for the refinement

@AlSchlo
Copy link
Contributor Author

AlSchlo commented Oct 29, 2025

@alexanderguzhva Yes, this is one issue that will need clever engineering. I was thinking of perhaps quantizing those coefficients. Also rabitq theory assumes random projection. We need to adapt the theory to be able to make it work with Cayley & PCA.

From a computational point of view however, even if the vector is binary quantized, Panorama can still be applied.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 30, 2025

@mnorris11 has imported this pull request. If you are a Meta employee, you can view this in D85902427.

@mnorris11
Copy link

Sorry for the delay on these PRs, I'm still conducting some benchmarking.

@aknayar
Copy link
Contributor

aknayar commented Nov 7, 2025

@mnorris11 No worries, and thank you so much for the reviews! As an update, after #4645 is confirmed, I have a local build of IndexRefinePanorama ready to submit with really promising results (2x E2E speedups on GIST with IVF256,PQ60x4fs as the base index vs. L2Flat as the refine index—seen below). I think speedups of 3x and above could be expected from more amenable datasets (OpenAI's DBpedia-Large, etc.).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants