[RFC] Support Sub Query Raw Scores in Hybrid Search

This issue describes details of design for supporting Sub Query Raw Scores in Hybrid Search. This feature has been requested through GitHub issue https://github.com/opensearch-project/neural-search/issues/1294 & https://github.com/opensearch-project/neural-search/issues/1180

### Problem

Currently, the hybrid search response only includes the final (normalized) score in each `SearchHit`, after normalization and combination. However, in several use cases—such as reranking, explainability, or custom post-processing—users require visibility into the original (pre-normalized or raw) scores from each subquery.
The lack of access to individual subquery scores limits users to working only with the final hybrid score, which is insufficient for advanced use cases.


### Requirements

#### Functional Requirements

* Each `SearchHit` in the hybrid search response should include the original (pre-normalized) scores of its subqueries.
* Maintain Consistent Sub Query Score Ordering
* Support for Multiple Shards and Single Shard

#### Non Functional Requirements

* Including subquery scores must not significantly impact query response time or introduce regressions in performance.
* Support Backward Compatibility

### Solution Overview


We propose extending the hybrid search response to include a new metadata field: `hybridization_sub_query_scores`. This field will contain a list of scores corresponding to each subquery executed as part of the hybrid query.


```
 "hybridization_sub_query_scores": [
                       0.34567,    ---> raw score of sub query 1
                        0.49510515, ---> raw score of sub query 2
                        0.234556    ---> raw score of sub query 3                   
                    ]
```


Each element in the `hybridization_sub_query_scores` list corresponds to the score from one of the subqueries. The ordering of scores will follow the internal ordering of subqueries in the hybrid query definition.
The sub query scores can be enabled through a flag `sub-query-scores` while defining a normalization pipeline for hybrid search. Like the below

```json
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "sub-query-scores": true,
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}
```

The solution can be achieved with different options:


#### Option 1:  Using getFetchSubPhase extension point  [Recommended]

This option uses OpenSearch’s existing `FetchSubPhase` extension mechanism to inject subquery scores into the `SearchHit` during the fetch phase. A new `HybridizationFetchSubPhase` class would be implemented to read subquery scores from a shared registry (e.g., `HybridScoreRegistry`) populated during the query execution phase.

#### **How It Works**

![Image](https://github.com/user-attachments/assets/c6a476da-5879-4f5b-9fef-8da0c14094e4)


* During the normalization phase, individual subquery scores are collected and stored in a registry `HybridScoreRegistry`
* During the fetch phase, the `HybridizationFetchSubPhase` retrieves the per-document subquery scores from the registry and inserts them into each `SearchHit` under a new field (e.g., `_hybridization`).
* This approach avoids altering the core scoring or asking user to add a new processor and neatly integrates into the OpenSearch plugin architecture.


*Note:* In the case of single shard there can be a flow where fetch phase can run before the query phase, there we need to update the SearchHit with the subquery scores [here](https://github.com/opensearch-project/neural-search/blob/17ab96fcf534e0ab7ccf6aa040d931a89a78b71d/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L282).

#### **Pros**

* Aligns with existing extensibility patterns in OpenSearch.
* Decouples logic from core query processing — safe, modular, and maintainable.
* Allows clear separation of concerns between scoring and rendering the response.

#### **Cons**

* Slight memory overhead for storing intermediate scores (should be acceptable for typical query sizes).

#### 

#### Option 2: Using SearchResponse processor

In this approach, subquery scores would be injected into the search response after the fetch phase but before the response is serialized. This could be implemented using a new response processor.

#### **How It Works**

* Create a new SubQueryScoresResponseProcessor in neural search to alter the response
* For each `SearchHit`, it adds the corresponding subquery scores from a shared map or context.
* This logic happens after all fetch phases have completed.

#### **Pros**

* Does not require changes to query execution or fetch phases.
* May simplify logic for cross-phase coordination, as everything happens post-query.

#### **Cons**

* Introduces a new response processor. Less modular and less transparent than using a `FetchSubPhase`.
*  Increases complexity of response handling logic.
* Tight coupling to internal response format may create upgrade and compatibility issues.




#### Option 3:  Creating a query parameter for hybrid scores in core

This approach proposes adding a new query-level parameter (`?hybridScores=true`) to the search request itself. When this flag is set, the query engine internally stores and returns the subquery scores as part of the standard `SearchHit`. This is very similar to how [verbose pipeline](https://docs.opensearch.org/docs/latest/search-plugins/search-pipelines/debugging-search-pipeline/) works in Search Pipeline currently. 

#### **How It Works**

* Modify the SearchSourceBuilder in core to support sub query scores in the source field.
* Pass the sub query scores from neural search plugin to core.

#### **Pros**

* Clean and visible user interface through query parameters.

#### **Cons**

* Tightly couples response formatting to query logic — violates separation of concerns.
* Increases the complexity and size of the core hybrid query code.
* Higher risk of introducing performance regressions or bugs.
* Harder to maintain and test compared to using fetch extensions.



### Low Level Design


We need the following changes:

* normalize() method would return a map of docIds and associated subqueryScores
* a new class `HybridizationFetchSubPhase`  to inject subquery scores into the `SearchHit` during the fetch phase.
* a new class HybridScoreRegistry to store the subQueryScores with associated search context.

![Image](https://github.com/user-attachments/assets/b25dd656-3c68-4858-8d29-91ad0806c3bf)


The `HybridizationFetchSubPhase` would like the below to add `_hybridization` field with subqueryScores.

```
public class HybridizationFetchSubPhase implements FetchSubPhase {

    public HybridizationFetchSubPhase() {}

    @Override
    public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) throws IOException {
        SearchContext context = ScoreNormalizer.getSearchContext();

        return new FetchSubPhaseProcessor() {
            LeafReaderContext ctx;

            @Override
            public void setNextReader(LeafReaderContext leafReaderContext) throws IOException {
                this.ctx = leafReaderContext;
            }

            @Override
            public void process(HitContext hitContext) {
                Map<Integer, float[]> scoreMap = HybridScoreRegistry.get(context);
                if (scoreMap == null) {
                    return;
                }
                int docId = hitContext.docId();
                float[] subqueryScores = scoreMap.get(docId);

                if (subqueryScores != null) {
                    // Add it as a field
                    hitContext.hit().setDocumentField("_hybridization", new DocumentField("_hybridization", List.of(subqueryScores)));
                }
            }
        };
    }
}
```



### Benchmarks
OpenSearch cluster consisting of a single r6g.8xlarge instance as the coordinator node along with three r6g.8xlarge instances as data nodes with multiple shards.

#### Min max normalization

|dataset	|3.1.0	|Sub Query Scores	|	|3.1.0	|Sub Query Scores	|	|3.1.0	|Sub Query Scores	|	|
|---	|---	|---	|---	|---	|---	|---	|---	|---	|---	|
|	|p50	|p50	|diff	|p90	|p90	|diff	|p99	|p99	|diff	|
|scidocs	|66.5	|66.5	|0	|70.5	|70.5	|0	|76.005	|75.005	|-1.31%	|
|fiqa	|70	|68	|-2.86%	|74	|71.65	|-3.18%	|77.5	|75.5	|-2.585	|
|quora	|70	|70	|0	|75	|74	|-1.33%	|83	|82	|-1.20%	|
|arguana	|118	|117	|-0.85%	|125.5	|124	|1.20%	|134.5	|132	|-1.80%	|

Sub Query Scores yield **modest performance gains** in p90 and p99, especially for **fiqa** and **arguana**, with **no regressions**.


#### RRF normalization

|dataset	|3.1.0	|Sub Query Scores	|	|3.1.0	|Sub Query Scores	|	|3.1.0	|Sub Query Scores	|	|
|---	|---	|---	|---	|---	|---	|---	|---	|---	|---	|
|	|p50	|p50	|diff	|p90	|p90	|diff	|p99	|p99	|diff	|
|scidocs	|67.5	|66	|-2.22%	|71	|69.5	|-2.11%	|75.505	|74.5	|-1.33%	|
|fiqa	|69.5	|67	|-3.60%	|74	|71.5	|-3.38%	|78.764	|74.5	|0.67%	|
|quora	|72	|70	|-2.78%	|77	|74	|-3.90%	|84	|81	|-3.57%	|
|arguana	|117	|117	|0	|124	|124	|0	|132	|131.475	|-0.40%	|

RRF normalization combined with Sub Query Scores shows **consistent, deeper improvements**, particularly for **fiqa** and **quora**, improving tail latencies (p99).


**Will perform another round of benchmarks**

Feedback Required

We greatly value feedback from the community to ensure that this proposal addresses real-world use cases effectively.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Support Sub Query Raw Scores in Hybrid Search #1419

Problem

Requirements

Functional Requirements

Non Functional Requirements

Solution Overview

Option 1: Using getFetchSubPhase extension point [Recommended]

How It Works

Pros

Cons

Option 2: Using SearchResponse processor

How It Works

Pros

Cons

Option 3: Creating a query parameter for hybrid scores in core

How It Works

Pros

Cons

Low Level Design

Benchmarks

Min max normalization

RRF normalization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dataset	3.1.0	Sub Query Scores		3.1.0	Sub Query Scores		3.1.0	Sub Query Scores
	p50	p50	diff	p90	p90	diff	p99	p99	diff
scidocs	66.5	66.5	0	70.5	70.5	0	76.005	75.005	-1.31%
fiqa	70	68	-2.86%	74	71.65	-3.18%	77.5	75.5	-2.585
quora	70	70	0	75	74	-1.33%	83	82	-1.20%
arguana	118	117	-0.85%	125.5	124	1.20%	134.5	132	-1.80%

[RFC] Support Sub Query Raw Scores in Hybrid Search #1419

Description

Problem

Requirements

Functional Requirements

Non Functional Requirements

Solution Overview

Option 1: Using getFetchSubPhase extension point [Recommended]

How It Works

Pros

Cons

Option 2: Using SearchResponse processor

How It Works

Pros

Cons

Option 3: Creating a query parameter for hybrid scores in core

How It Works

Pros

Cons

Low Level Design

Benchmarks

Min max normalization

RRF normalization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions