-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Conversation
Signed-off-by: Isaac Johnson <isaacnj@amazon.com>
Signed-off-by: Isaac Johnson <isaacnj@amazon.com>
Signed-off-by: Isaac Johnson <isaacnj@amazon.com>
Signed-off-by: Isaac Johnson <isaacnj@amazon.com>
Signed-off-by: Isaac Johnson <114550967+Johnsonisaacn@users.noreply.github.com>
we should be merging to feature branch https://github.com/opensearch-project/neural-search/tree/feature/rrf-score-normalization, not main. |
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
// Not currently using weights for RRF, no need to modify or verify these params | ||
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) { | ||
; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class not completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we planned to have very simple implementation for this one, I'll be finishing this PR and address all misses if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished the class, please take a look @yuye-aws
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
* Collection of utility methods for score combination technique classes | ||
*/ | ||
@Log4j2 | ||
class ScoreNormalizationUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we shift the code in this class to HybridQueryUtil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it belongs there, my view is - anything related to the query itself should go to that class, like parsing score collection into multiple sub query results.
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
|
||
/** | ||
* DTO object to hold data required for score normalization passed to execute() function | ||
* in NormalizationProcessorWorkflow. Field rankConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* in NormalizationProcessorWorkflow. Field rankConstant | |
* in NormalizationProcessorWorkflow. |
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RRFProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RRFProcessorFactory.java
Outdated
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment.
LGTM.
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Log4j2 | ||
@AllArgsConstructor | ||
public class RRFProcessor implements SearchPhaseResultsProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere to validate that RRFNormalizationTechnique
is used together with RRFScoreCombinationTechnique
? The execute
method in NormalizationProcessorWorkflow
class doing normalization and them combination.
...ava/org/opensearch/neuralsearch/processor/combination/RRFScoreCombinationTechniqueTests.java
Show resolved
Hide resolved
private float RRF(List<Float> scores, List<Double> weights) { | ||
float sumScores = 0.0f; | ||
for (float score : scores) { | ||
sumScores += score; | ||
} | ||
return sumScores; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding these method in this testing? I think you can simply with a few examples like 1 plus 1 is 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's added to be compatible with https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/processor/combination/BaseScoreCombinationTechniqueTests.java and be able to use all test cases it provides. We need to ensure in better possible test coverage if it's a low hanging fruit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not get your point. The private randomScore
outputs non-deterministic results.
assertEquals( | ||
RescoreContext.getDefault().getOversampleFactor(), | ||
neuralQueryBuilder.rescoreContext().getOversampleFactor(), | ||
DELTA_FOR_FLOATS_ASSERTION | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are already using big decimal, please remove the delta here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what do you mean, assert requires third parameter in case we're comparing floats, and both arguments are float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you are using big decimal in the test rrfNorm
method. You can be more strict, and the delta can be set to 0.
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
We do not retrieve normalization technique from user input, it's hardcoded and passed to processor class by the factory, check out code snippet I want to keep NormalizationProcessorWorkflow generic, maybe later refactor it to more abstract class not specific to normalization. |
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
I've addressed all comments, and most of them were minor in recent reviews. I'll be merging this one to feature branch and we'll start one more related to RRF soon, with focus on testing |
245cd14
into
opensearch-project:feature/rrf-score-normalization-v2
Nice work @martin-gaievski |
…pensearch-project#874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…pensearch-project#874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…pensearch-project#874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…pensearch-project#874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874) * initial commit of RRF Signed-off-by: Isaac Johnson <isaacnj@amazon.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Description
Adding ability to process and combine scores from multiple subqueries in neural search using the reciprocal rank fusion (RRF) technique. Built with a new processor and processor factory class apart from NormalizationProcessor. Changes to API included in RFC. Does not currently support weights when combining processed subquery scores, based on lack of examples in existing literature.
Example of usage for RRF processor:
create index
create pipeline with rrf processor and all defaults
ingest 4 documents
run search request
you'll get following response
if you change rank to something smaller, like '1' your scores all will be scalled up
update rank contant
and search response is
for comparison this is the response for same query if we use normalization processor with default techniques.
Important difference is that delta between document scores with RRF is much smaller, this is because it's based on document rank that are typically close in value comparing to scores where delta can be huge.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
#865
#659
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.