Description
This issue describes details of design for supporting Sub Query Raw Scores in Hybrid Search. This feature has been requested through GitHub issue #1294 & #1180
Problem
Currently, the hybrid search response only includes the final (normalized) score in each SearchHit
, after normalization and combination. However, in several use cases—such as reranking, explainability, or custom post-processing—users require visibility into the original (pre-normalized or raw) scores from each subquery.
The lack of access to individual subquery scores limits users to working only with the final hybrid score, which is insufficient for advanced use cases.
Requirements
Functional Requirements
- Each
SearchHit
in the hybrid search response should include the original (pre-normalized) scores of its subqueries. - Maintain Consistent Sub Query Score Ordering
- Support for Multiple Shards and Single Shard
Non Functional Requirements
- Including subquery scores must not significantly impact query response time or introduce regressions in performance.
- Support Backward Compatibility
Solution Overview
We propose extending the hybrid search response to include a new metadata field: hybridization_sub_query_scores
. This field will contain a list of scores corresponding to each subquery executed as part of the hybrid query.
"hybridization_sub_query_scores": [
0.34567, ---> raw score of sub query 1
0.49510515, ---> raw score of sub query 2
0.234556 ---> raw score of sub query 3
]
Each element in the hybridization_sub_query_scores
list corresponds to the score from one of the subqueries. The ordering of scores will follow the internal ordering of subqueries in the hybrid query definition.
The sub query scores can be enabled through a flag sub-query-scores
while defining a normalization pipeline for hybrid search. Like the below
{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"sub-query-scores": true,
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.3,
0.7
]
}
}
}
}
]
}
The solution can be achieved with different options:
Option 1: Using getFetchSubPhase extension point [Recommended]
This option uses OpenSearch’s existing FetchSubPhase
extension mechanism to inject subquery scores into the SearchHit
during the fetch phase. A new HybridizationFetchSubPhase
class would be implemented to read subquery scores from a shared registry (e.g., HybridScoreRegistry
) populated during the query execution phase.
How It Works
- During the normalization phase, individual subquery scores are collected and stored in a registry
HybridScoreRegistry
- During the fetch phase, the
HybridizationFetchSubPhase
retrieves the per-document subquery scores from the registry and inserts them into eachSearchHit
under a new field (e.g.,_hybridization
). - This approach avoids altering the core scoring or asking user to add a new processor and neatly integrates into the OpenSearch plugin architecture.
Note: In the case of single shard there can be a flow where fetch phase can run before the query phase, there we need to update the SearchHit with the subquery scores here.
Pros
- Aligns with existing extensibility patterns in OpenSearch.
- Decouples logic from core query processing — safe, modular, and maintainable.
- Allows clear separation of concerns between scoring and rendering the response.
Cons
- Slight memory overhead for storing intermediate scores (should be acceptable for typical query sizes).
Option 2: Using SearchResponse processor
In this approach, subquery scores would be injected into the search response after the fetch phase but before the response is serialized. This could be implemented using a new response processor.
How It Works
- Create a new SubQueryScoresResponseProcessor in neural search to alter the response
- For each
SearchHit
, it adds the corresponding subquery scores from a shared map or context. - This logic happens after all fetch phases have completed.
Pros
- Does not require changes to query execution or fetch phases.
- May simplify logic for cross-phase coordination, as everything happens post-query.
Cons
- Introduces a new response processor. Less modular and less transparent than using a
FetchSubPhase
. - Increases complexity of response handling logic.
- Tight coupling to internal response format may create upgrade and compatibility issues.
Option 3: Creating a query parameter for hybrid scores in core
This approach proposes adding a new query-level parameter (?hybridScores=true
) to the search request itself. When this flag is set, the query engine internally stores and returns the subquery scores as part of the standard SearchHit
. This is very similar to how verbose pipeline works in Search Pipeline currently.
How It Works
- Modify the SearchSourceBuilder in core to support sub query scores in the source field.
- Pass the sub query scores from neural search plugin to core.
Pros
- Clean and visible user interface through query parameters.
Cons
- Tightly couples response formatting to query logic — violates separation of concerns.
- Increases the complexity and size of the core hybrid query code.
- Higher risk of introducing performance regressions or bugs.
- Harder to maintain and test compared to using fetch extensions.
Low Level Design
We need the following changes:
- normalize() method would return a map of docIds and associated subqueryScores
- a new class
HybridizationFetchSubPhase
to inject subquery scores into theSearchHit
during the fetch phase. - a new class HybridScoreRegistry to store the subQueryScores with associated search context.
The HybridizationFetchSubPhase
would like the below to add _hybridization
field with subqueryScores.
public class HybridizationFetchSubPhase implements FetchSubPhase {
public HybridizationFetchSubPhase() {}
@Override
public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) throws IOException {
SearchContext context = ScoreNormalizer.getSearchContext();
return new FetchSubPhaseProcessor() {
LeafReaderContext ctx;
@Override
public void setNextReader(LeafReaderContext leafReaderContext) throws IOException {
this.ctx = leafReaderContext;
}
@Override
public void process(HitContext hitContext) {
Map<Integer, float[]> scoreMap = HybridScoreRegistry.get(context);
if (scoreMap == null) {
return;
}
int docId = hitContext.docId();
float[] subqueryScores = scoreMap.get(docId);
if (subqueryScores != null) {
// Add it as a field
hitContext.hit().setDocumentField("_hybridization", new DocumentField("_hybridization", List.of(subqueryScores)));
}
}
};
}
}
Benchmarks
OpenSearch cluster consisting of a single r6g.8xlarge instance as the coordinator node along with three r6g.8xlarge instances as data nodes with multiple shards.
Min max normalization
dataset | 3.1.0 | Sub Query Scores | 3.1.0 | Sub Query Scores | 3.1.0 | Sub Query Scores | |||
---|---|---|---|---|---|---|---|---|---|
p50 | p50 | diff | p90 | p90 | diff | p99 | p99 | diff | |
scidocs | 66.5 | 66.5 | 0 | 70.5 | 70.5 | 0 | 76.005 | 75.005 | -1.31% |
fiqa | 70 | 68 | -2.86% | 74 | 71.65 | -3.18% | 77.5 | 75.5 | -2.585 |
quora | 70 | 70 | 0 | 75 | 74 | -1.33% | 83 | 82 | -1.20% |
arguana | 118 | 117 | -0.85% | 125.5 | 124 | 1.20% | 134.5 | 132 | -1.80% |
Sub Query Scores yield modest performance gains in p90 and p99, especially for fiqa and arguana, with no regressions.
RRF normalization
dataset | 3.1.0 | Sub Query Scores | 3.1.0 | Sub Query Scores | 3.1.0 | Sub Query Scores | |||
---|---|---|---|---|---|---|---|---|---|
p50 | p50 | diff | p90 | p90 | diff | p99 | p99 | diff | |
scidocs | 67.5 | 66 | -2.22% | 71 | 69.5 | -2.11% | 75.505 | 74.5 | -1.33% |
fiqa | 69.5 | 67 | -3.60% | 74 | 71.5 | -3.38% | 78.764 | 74.5 | 0.67% |
quora | 72 | 70 | -2.78% | 77 | 74 | -3.90% | 84 | 81 | -3.57% |
arguana | 117 | 117 | 0 | 124 | 124 | 0 | 132 | 131.475 | -0.40% |
RRF normalization combined with Sub Query Scores shows consistent, deeper improvements, particularly for fiqa and quora, improving tail latencies (p99).
Will perform another round of benchmarks
Feedback Required
We greatly value feedback from the community to ensure that this proposal addresses real-world use cases effectively.
Metadata
Metadata
Assignees
Type
Projects
Status