-
Notifications
You must be signed in to change notification settings - Fork 94
Support sub query raw scores in Hybrid Search Response #1369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
) { | ||
for (ScoreDoc scoreDoc : topDocs.scoreDocs) { | ||
float normalizedScore = calculateNormalizedScore(Arrays.asList(topDocs.scoreDocs).indexOf(scoreDoc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing O(N2), changed it to index based reference
07b39e7
to
2eb0d1f
Compare
bc183c6
to
70ec7ec
Compare
src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java
Outdated
Show resolved
Hide resolved
63a54ff
to
09664ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @owaiskazi19 on creating the logic for capturing subquery scores.
@@ -58,6 +59,10 @@ public static boolean isClusterOnOrAfterMinReqVersionForStatCategoryFiltering() | |||
return NeuralSearchClusterUtil.instance().getClusterMinVersion().onOrAfter(MINIMAL_SUPPORTED_VERSION_STATS_CATEGORY_FILTERING); | |||
} | |||
|
|||
public static boolean isClusterOnOrAfterMinReqVersionForSubQuerySupport() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need one BWC test to validate this condition does not break anything.
import java.util.Map; | ||
|
||
/** | ||
* Registry to store hybrid score for each search context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding, there is only one searchcontext per search. Therefore, can we reframe this javadoc in a specific way like "Registry to store hybrid score for each document" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the later part, I see we are using searchPhaseContext. SearchPhaseContext can be multiple like after the QP , then after the FP as well.
@Override | ||
public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) throws IOException { | ||
// Check if inner hits are present | ||
boolean hasInnerHits = fetchContext.innerHits() != null && !fetchContext.innerHits().getInnerHits().isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we really need to check !fetchContext.innerHits().getInnerHits().isEmpty()
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming if fetchContext.innerHits is true then it should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetchContext.innerHits() returns an InnerHitsContext object — which may be non-null, but contain no inner hits. Then hasInnerHits will be true even if there are no actual inner hits to process
|
||
if (subqueryScores != null) { | ||
// Add it as a field rather than modifying _source | ||
List<Object> hybridScores = new ArrayList<>(subqueryScores.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
// Check for all the conditions | ||
boolean shouldAddHybridScores = subqueryScores != null | ||
&& documentFields.containsKey(SUB_QUERY_SCORES_NAME) == false | ||
&& isClusterOnOrAfterMinReqVersionForSubQuerySupport() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this whole check should be wrapped under
if (isClusterOnOrAfterMinReqVersionForSubQuerySupport() && hasInnerHits ==false )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the same thing checking condtion1 && condition2 and then condition3 && condition4 or all together like condtion1 && condition2 && condtion3 && condtion4 ?
|
||
// Check for all the conditions | ||
boolean shouldAddHybridScores = subqueryScores != null | ||
&& documentFields.containsKey(SUB_QUERY_SCORES_NAME) == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a BWC test when out of 3 nodes 1 node is running on newer version and other 2 are running older version. Then request for hybridization scores.
.../java/org/opensearch/neuralsearch/processor/normalization/L2ScoreNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/normalization/ScoreNormalizer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments after recently added commits
src/main/java/org/opensearch/neuralsearch/processor/HybridScoreRegistry.java
Outdated
Show resolved
Hide resolved
.../java/org/opensearch/neuralsearch/processor/normalization/L2ScoreNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/HybridizationFetchSubPhase.java
Show resolved
Hide resolved
.../java/org/opensearch/neuralsearch/processor/normalization/L2ScoreNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/HybridScoreRegistry.java
Show resolved
Hide resolved
d7173ec
to
2ca0411
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1369 +/- ##
============================================
- Coverage 79.95% 0 -79.96%
============================================
Files 160 0 -160
Lines 8584 0 -8584
Branches 1396 0 -1396
============================================
- Hits 6863 0 -6863
+ Misses 1186 0 -1186
+ Partials 535 0 -535 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2ca0411
to
1e303f1
Compare
#1416 is failing most of the times |
I can help with retrying only failed tests. |
Will run benchmarking again after #1457 is merged |
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
…hits are enabled Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
Signed-off-by: Owais <[email protected]>
1e303f1
to
101c629
Compare
Signed-off-by: Owais <[email protected]>
Description
Support sub query raw scores in Hybrid Search Response through
getFetchSubPhases
extension point.Also, added a small improvement for RRF processors while working on this.
Flag to enable sub query scores while creating the pipeline
Response
Related Issues
Resolves #1294, #1180 & #1419
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.