Description
Introduction
This document discusses the design of the upper bound feature for min-max score normalization technique in OpenSearch's hybrid search capability, complementing the existing lower bound feature.
Problem Statement
The current min-max normalization can produce misleading relevancy scores when the theoretical maximum score is known but differs from the actual maximum score in the result set. In neural/k-NN search scenarios where scores have known theoretical bounds (e.g., [0.75, 1.0]), the current normalization can overstate document relevance by normalizing to the actual maximum score rather than the theoretical maximum. Users who need more precise control over score normalization can use the upper bound feature to improve the relevance of their results.
Requirements
Functional Requirements
- Support configurable upper bounds at sub-query level
- Provide a way to define a score for upper bound, which can be ignored if needed.
- Allow independent upper bound configuration for each sub-query
- Ensure proper interaction with lower bound feature while maintaining its existing behavior
Non-Functional Requirements
- Minimal performance impact on score normalization
Current State
The min-max normalization technique currently:
- Uses actual retrieved scores to find minimum and maximum scores for normalization
- Has a lower bound feature implemented through LowerBound class with an inner Mode enum (APPLY, CLIP, IGNORE)
- Contains bound-related logic directly within the normalization class
Current Score Calculation Formula
normalized_score = (score - min_score) / (max_score - min_score)
Note: min_score
is changed depending on the LowerBound.Mode
being used
Example

In the example above, consider a scenario where scores theoretically range from 0.0 to 1.0. When a query returns scores [0.75, 0.76, 0.77], the current normalization process treats:
- 0.75 as the minimum, normalizing it to 0.0
- 0.77 as the maximum, normalizing it to 1.0
- 0.76 as the midpoint, normalizing it to 0.5
While the existing lower bound feature can address score distortion at the lower end by setting a minimum threshold, there is no equivalent mechanism for the upper end. This creates a significant distortion in relevancy representation. Despite all scores being clustered between 0.75-0.77, the normalization spreads them across the entire range from 0.0 to 1.0, suggesting much larger relevancy differences than actually exist. The current implementation lacks the ability to fully contextualize these scores within their theoretical range, where they all represent highly relevant documents with scores close to the maximum possible value of 1.0.
Solution HLD
Proposed Solution

The proposed solution introduces an upper bound feature to complement the existing lower bound functionality in the min-max score normalization technique. This will be achieved through the following architectural changes:
- Abstract Base Class: Create a new ScoreBound abstract class to encapsulate common behavior for both upper and lower bounds.
- Bound Mode Enum: Extract the existing LowerBound.Mode into a standalone BoundMode enum to be used by both bound types.
- Upper Bound Implementation: Introduce a new UpperBound class extending ScoreBound to handle upper bound logic.
- Refactor Existing Lower Bound: Modify the LowerBound class to extend ScoreBound and use the new BoundMode enum.
- Enhanced Normalization Technique: Update MinMaxScoreNormalizationTechnique to support both upper and lower bounds using a common interface.
API Configuration
{
"normalization": {
"technique": "min_max",
"parameters": {
"lower_bounds": [
{
"mode": "apply",
"min_score": 0.0
},
{
"mode": "clip",
"min_score": 0.0
},
{
"mode": "ignore"
}
],
"upper_bounds": [
{
"mode": "apply",
"max_score": 1.0
},
{
"mode": "clip",
"max_score": 1.0
},
{
"mode": "ignore"
}
]
}
}
}
Key Design Decisions
Standalone Bound Mode Enum
- Decision: Extract Mode from LowerBound into a separate BoundMode enum
- Rationale: Allows shared use between upper and lower bounds, improving consistency and maintainability
Symmetrical Upper Bound Implementation
- Decision: Implement UpperBound similarly to LowerBound
- Rationale: Provides a consistent API and behavior for users, simplifying understanding and usage
Minimal Changes to Existing API
- Decision: Extend the current configuration structure by adding upper_bounds alongside lower_bounds, without modifying the existing lower_bounds structure or behavior
- Rationale: Addresses the functional requirement to maintain current functionality for lower bounds. Ensures proper interaction between upper and lower bounds while preserving existing lower bound behavior, allowing users to adopt the new feature without impacting their current queries
Bound Processing in Normalization Technique
- Decision: Process both bounds within the normalizeSingleScore method
- Rationale: Centralizes bound logic, ensuring correct interaction between upper and lower bounds
Solution LLD

New Score Calculation Formula
normalized_score = (score - effective_min_score) / (effective_max_score - effective_min_score)
Preliminary Benchmarking
Initial benchmarking shows improvements in relevance metrics when using bounds in some scenarios. Here are two examples:
Example 1: Upper Bounds (nfcorpus dataset)
Metric | Default | With Upper Bound | Improvement |
---|---|---|---|
NDCG@5 | 0.3343 | 0.3379 | 1.10% |
NDCG@10 | 0.303 | 0.3017 | -0.40% |
NDCG@100 | 0.2671 | 0.2691 | 0.70% |
Example 2: Combined Lower/Upper Bounds (TREC-COVID dataset)
Metric | Default | With Bounds | Improvement |
---|---|---|---|
NDCG@5 | 0.6025 | 0.6707 | 11.30% |
NDCG@10 | 0.5518 | 0.6218 | 12.70% |
NDCG@100 | 0.3859 | 0.4318 | 11.90% |
Note: These results are from specific test configurations. Results may vary depending on the nature of queries, index settings, and characteristics of the dataset
Testing
Unit Tests:
- Upper bound configuration parsing
- Score normalization with different modes
- Integration with lower bounds
- Edge cases and error conditions
Integration Tests:
- All three upper bound modes
- Integration with lower bounds
Community Feedback
We appreciate all feedback from the community on this RFC. In addition, we are particularly interested in your thoughts on the following questions:
- Would you prefer additional configuration options beyond what's proposed?
- How should the system behave when both upper and lower bounds are specified in potentially conflicting ways?
- How would you combine this with other scoring techniques in your current implementations?
- What types of examples would help you understand when and how to use upper bounds effectively?
Please share your feedback through comments on this RFC, GitHub issues, or pull requests with proposed changes.
References
- Feature Request: [FEATURE] Add upper_bound in min-max normalization #1210
- Lower Bound RFC: [RFC] Lower bound for min-max normalization technique in Hybrid query #1189
Metadata
Metadata
Assignees
Type
Projects
Status