Skip to content

Conversation

silver-ymz
Copy link
Member

close #27

Add a new guc bm25_catalog.enable_prefilter. It will pushdown the where clause to index process. And index will check it before calculating bm25 score.

  • bm25_catalog.enable_prefilter is default true.
  • You can see tests/sqllogictest/prefilter.slt as an example.

Signed-off-by: Mingzhuo Yin <[email protected]>
Signed-off-by: Mingzhuo Yin <[email protected]>
@silver-ymz silver-ymz requested a review from Copilot September 8, 2025 06:16
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new prefilter feature to the BM25 catalog extension that pushes WHERE clause filters down to the index scanning process, allowing conditions to be evaluated before computing BM25 scores.

  • Adds a new GUC parameter bm25_catalog.enable_prefilter (default: true) to control prefiltering
  • Implements a PostgreSQL executor hook to capture scan state and enable filter evaluation during index scans
  • Integrates prefilter functionality into the block_wand algorithms to check conditions before scoring documents

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/sqllogictest/prefilter.slt Test file demonstrating prefilter functionality with different settings
src/segment/field_norm.rs Updated lifetime annotation for memory reader method
src/index/scan.rs Major changes to scanner state management and prefilter implementation
src/index/mod.rs Added hook module initialization
src/index/hook.rs New PostgreSQL executor hook implementation for prefilter support
src/guc.rs Added ENABLE_PREFILTER GUC setting configuration
src/datatype/text_bm25vector.rs Minor string formatting improvements
src/datatype/memory_bm25vector.rs Updated lifetime annotation for borrow method
src/datatype/functions.rs Simplified negation expression
src/algorithm/block_wand.rs Added filter parameter to block_wand algorithms

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

(*node).ss.ss_currentRelation,
(*node).iss_RelationDesc,
(*(*node).ss.ps.state).es_snapshot,
// #[cfg(feature = "pg18")]
Copy link

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed unless it serves a specific purpose. If this is needed for future PostgreSQL version compatibility, add a clear comment explaining the purpose and expected timeline for usage.

Suggested change
// #[cfg(feature = "pg18")]
// #[cfg(feature = "pg18")]
// The following line is required for PostgreSQL 18+ compatibility.
// Uncomment when upgrading to PostgreSQL 18 or enabling the "pg18" feature.

Copilot uses AI. Check for mistakes.

state: *mut pgrx::pg_sys::ExprState,
econtext: *mut pgrx::pg_sys::ExprContext,
) -> bool {
use pgrx::PgMemoryContexts;
Copy link

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for returning true when the state is null should be documented. It's not immediately clear why a null state should be considered as passing the boolean qualification check.

Suggested change
use pgrx::PgMemoryContexts;
use pgrx::PgMemoryContexts;
// If the qualification state is null, there are no conditions to check,
// so the qualification passes by default. This follows PostgreSQL convention.

Copilot uses AI. Check for mistakes.

false
}

unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool {
Copy link

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment, returning true for a null node should be documented to explain why this represents a valid/passing condition for the check function.

Suggested change
unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool {
unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool {
// If the scan state node is null, there are no conditions to check,
// so we return true to indicate a passing/valid condition.

Copilot uses AI. Check for mistakes.

@silver-ymz silver-ymz requested a review from VoVAllen September 8, 2025 06:37
@VoVAllen VoVAllen merged commit 89e00a1 into tensorchord:main Sep 26, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Support prefilter

2 participants