Enable using hybrid retrieval at deploy. #107
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The original
hybrid_rrf
andhybrid_cc
functions takesids
andscores
as their input. So, it was hard to use these hybrid modules at deploy.To understand this difference, you must understand the way optimization and deploy
run
is working.As you know, the module functions (including its decorator of course) is treat as independent functions both on optimization and deployment process.
In optimization process, we rely
run.py
functions for executing each module functions. Most of the case,run.py
do not run something special for running modules, but hybrid was special. To running hybrid functions, retrievalrun.py
must select best modules among user'starget_modules
input, and extracting its ids and scores. Finally, we could use hybrid modules properly.However, in deployment
Runner.run
, do not userun.py
functions. Becauserun.py
functions merely contains optimization process, and it is super inefficient to userun.py
at deployment feature.Instead using
run.py
, atRunner
extracts the best module name and module parameters fromsummary.csv
, which made atrun.py
, and construct new config dictionary. With that dictionary,Runner
can run whole modules one by one with selected parameters.And here was the problem. As you know, hybrid modules must pass
ids
andscores
, and that parameters made atrun.py
. But, at summary.csv, there was noids
andscores
parameters, buttarget_modules
, because summary.csv save user's input parameters as default. Hence, it was impossible to run hybrid modules at deploymentRunner
class.So, what was the solution?
I swapped module parameters of hybrid modules at
summary.csv
in retrievalrun.py
. It means, I deleteids
andscores
at module params, and addtarget_modules
andtarget_module_params
as new hybrid module params.And at the retrieval decorator, if there are no
ids
andscores
parameters, I run other retrieval module with inputtarget_modules
andtarget_module_params
.In this way, you can run another retrieval module at retrieval node decorator, and obtain
ids
andscores
, which is input of hybrid modules.Since summary.csv module params saved with
target_modules
andtarget_module_params
, we now can use hybrid modules at deployment!!close #91
p.s. I thought it was great challenge for me, but it resolved pretty simple. I think the isolation of three parts (optimization, deployment runner, and modules) is really great and flexible. Maybe we can find a way to resolve some other weird methods, thanks to this isolation structure.