Conversation
Signed-off-by: Mingzhuo Yin <[email protected]>
Signed-off-by: Mingzhuo Yin <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new prefilter feature to the BM25 catalog extension that pushes WHERE clause filters down to the index scanning process, allowing conditions to be evaluated before computing BM25 scores.
- Adds a new GUC parameter
bm25_catalog.enable_prefilter(default: true) to control prefiltering - Implements a PostgreSQL executor hook to capture scan state and enable filter evaluation during index scans
- Integrates prefilter functionality into the block_wand algorithms to check conditions before scoring documents
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/sqllogictest/prefilter.slt | Test file demonstrating prefilter functionality with different settings |
| src/segment/field_norm.rs | Updated lifetime annotation for memory reader method |
| src/index/scan.rs | Major changes to scanner state management and prefilter implementation |
| src/index/mod.rs | Added hook module initialization |
| src/index/hook.rs | New PostgreSQL executor hook implementation for prefilter support |
| src/guc.rs | Added ENABLE_PREFILTER GUC setting configuration |
| src/datatype/text_bm25vector.rs | Minor string formatting improvements |
| src/datatype/memory_bm25vector.rs | Updated lifetime annotation for borrow method |
| src/datatype/functions.rs | Simplified negation expression |
| src/algorithm/block_wand.rs | Added filter parameter to block_wand algorithms |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| (*node).ss.ss_currentRelation, | ||
| (*node).iss_RelationDesc, | ||
| (*(*node).ss.ps.state).es_snapshot, | ||
| // #[cfg(feature = "pg18")] |
There was a problem hiding this comment.
Commented-out code should be removed unless it serves a specific purpose. If this is needed for future PostgreSQL version compatibility, add a clear comment explaining the purpose and expected timeline for usage.
| // #[cfg(feature = "pg18")] | |
| // #[cfg(feature = "pg18")] | |
| // The following line is required for PostgreSQL 18+ compatibility. | |
| // Uncomment when upgrading to PostgreSQL 18 or enabling the "pg18" feature. |
| state: *mut pgrx::pg_sys::ExprState, | ||
| econtext: *mut pgrx::pg_sys::ExprContext, | ||
| ) -> bool { | ||
| use pgrx::PgMemoryContexts; |
There was a problem hiding this comment.
The logic for returning true when the state is null should be documented. It's not immediately clear why a null state should be considered as passing the boolean qualification check.
| use pgrx::PgMemoryContexts; | |
| use pgrx::PgMemoryContexts; | |
| // If the qualification state is null, there are no conditions to check, | |
| // so the qualification passes by default. This follows PostgreSQL convention. |
| false | ||
| } | ||
|
|
||
| unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { |
There was a problem hiding this comment.
Similar to the previous comment, returning true for a null node should be documented to explain why this represents a valid/passing condition for the check function.
| unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { | |
| unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { | |
| // If the scan state node is null, there are no conditions to check, | |
| // so we return true to indicate a passing/valid condition. |
close #27
Add a new guc
bm25_catalog.enable_prefilter. It will pushdown the where clause to index process. And index will check it before calculating bm25 score.bm25_catalog.enable_prefilteris default true.tests/sqllogictest/prefilter.sltas an example.