Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Changes from 38 commits
b3fe6c1
55917e3
7590532
93e4778
632f2e0
1b7d150
cc47084
5aeb509
274109f
92946e7
139c132
5ea082c
0fa9fcb
3862af9
677485f
6cdaa5d
896c1ba
e26e785
300e425
43493b9
4342681
aadcd35
b260942
e392328
17d4812
6e9cb61
2cc92e2
b92a2bd
c9e5fa4
08a862f
cfd0113
0647ad6
84a627c
4fdfcd2
a07b379
2734488
c93b946
2b80c81
92a10c7
f07b713
44cdc6f
a6237aa
f6d5148
08c969a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a very high level question: Instead of creating a new processor, can't we reuse the same normalization processor where in place of normalization technique user can sent "rrf" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can we don't want to. There will be some limitations if we go that route: factory logic will became overloaded and not generic anymore, mainly that because for normalization processor today we allow any combination of techniques, but that would not be the case with rrf techniques. There are some rank based techniques that we can add later, that will follow the same route and make sense only in scope of ranking.
Basically with such approach we'll be leaning towards the single multifunctional processor, while with current design it's more like smaller specialized processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC has the single processor approach as an alternative, it has been reviewed and team agreed on a new processor approach as preferred one #865
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also having the same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both of you asking different questions: @vibrantvarun is referring to a single processor for RRF and score normalization, and @yuye-aws mentioning Alternative 2, which is about adding a new processor for RRF, but exposing both normalization and combination technique as params to end-user.
I can answer both in a similar fashion:
Fundamentally score normalization and rank based combination are different, so combining them in existing normalization processor isn't intuitive. Besides that it will require additional validation logic and at the code level will ruin existing abstractions, mainly because for normalization processor today we allow pairing of any normalization technique with any combination techniques. With addition of RRF we have to break this.
RRF is leaning towards the combination method as per offline discussion with our PM, exposing normalization function doesn't make sense/not adding value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere to validate that
RRFNormalizationTechnique
is used together withRRFScoreCombinationTechnique
? Theexecute
method inNormalizationProcessorWorkflow
class doing normalization and them combination.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK that weights are not supported in the first release. This class does nothing but adding all the scores together. I'm afraid it's too over designed to introduce a new class for such a single sum operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reusing NormalizationProcessorWorkflow that is quite a big class, and it accepts both normalization and combination techniques classes as input arguments. Plus it's a single responsibility principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. For this PR, both params and combinationUtil are unused. You'd better delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will do in next PR, please remind me if I forget