-
Notifications
You must be signed in to change notification settings - Fork 11
feat: support prefilter #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mingzhuo Yin <[email protected]>
Signed-off-by: Mingzhuo Yin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new prefilter feature to the BM25 catalog extension that pushes WHERE clause filters down to the index scanning process, allowing conditions to be evaluated before computing BM25 scores.
- Adds a new GUC parameter
bm25_catalog.enable_prefilter
(default: true) to control prefiltering - Implements a PostgreSQL executor hook to capture scan state and enable filter evaluation during index scans
- Integrates prefilter functionality into the block_wand algorithms to check conditions before scoring documents
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/sqllogictest/prefilter.slt | Test file demonstrating prefilter functionality with different settings |
src/segment/field_norm.rs | Updated lifetime annotation for memory reader method |
src/index/scan.rs | Major changes to scanner state management and prefilter implementation |
src/index/mod.rs | Added hook module initialization |
src/index/hook.rs | New PostgreSQL executor hook implementation for prefilter support |
src/guc.rs | Added ENABLE_PREFILTER GUC setting configuration |
src/datatype/text_bm25vector.rs | Minor string formatting improvements |
src/datatype/memory_bm25vector.rs | Updated lifetime annotation for borrow method |
src/datatype/functions.rs | Simplified negation expression |
src/algorithm/block_wand.rs | Added filter parameter to block_wand algorithms |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
(*node).ss.ss_currentRelation, | ||
(*node).iss_RelationDesc, | ||
(*(*node).ss.ps.state).es_snapshot, | ||
// #[cfg(feature = "pg18")] |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed unless it serves a specific purpose. If this is needed for future PostgreSQL version compatibility, add a clear comment explaining the purpose and expected timeline for usage.
// #[cfg(feature = "pg18")] | |
// #[cfg(feature = "pg18")] | |
// The following line is required for PostgreSQL 18+ compatibility. | |
// Uncomment when upgrading to PostgreSQL 18 or enabling the "pg18" feature. |
Copilot uses AI. Check for mistakes.
state: *mut pgrx::pg_sys::ExprState, | ||
econtext: *mut pgrx::pg_sys::ExprContext, | ||
) -> bool { | ||
use pgrx::PgMemoryContexts; |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for returning true
when the state is null should be documented. It's not immediately clear why a null state should be considered as passing the boolean qualification check.
use pgrx::PgMemoryContexts; | |
use pgrx::PgMemoryContexts; | |
// If the qualification state is null, there are no conditions to check, | |
// so the qualification passes by default. This follows PostgreSQL convention. |
Copilot uses AI. Check for mistakes.
false | ||
} | ||
|
||
unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, returning true
for a null node should be documented to explain why this represents a valid/passing condition for the check function.
unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { | |
unsafe fn check(node: *mut pgrx::pg_sys::IndexScanState, p: pgrx::pg_sys::ItemPointer) -> bool { | |
// If the scan state node is null, there are no conditions to check, | |
// so we return true to indicate a passing/valid condition. |
Copilot uses AI. Check for mistakes.
close #27
Add a new guc
bm25_catalog.enable_prefilter
. It will pushdown the where clause to index process. And index will check it before calculating bm25 score.bm25_catalog.enable_prefilter
is default true.tests/sqllogictest/prefilter.slt
as an example.