Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable using hybrid retrieval at deploy. #107

Merged
merged 8 commits into from
Feb 3, 2024
Merged

Enable using hybrid retrieval at deploy. #107

merged 8 commits into from
Feb 3, 2024

Conversation

vkehfdl1
Copy link
Contributor

@vkehfdl1 vkehfdl1 commented Feb 3, 2024

The original hybrid_rrf and hybrid_cc functions takes ids and scores as their input. So, it was hard to use these hybrid modules at deploy.
To understand this difference, you must understand the way optimization and deploy run is working.
As you know, the module functions (including its decorator of course) is treat as independent functions both on optimization and deployment process.
In optimization process, we rely run.py functions for executing each module functions. Most of the case, run.py do not run something special for running modules, but hybrid was special. To running hybrid functions, retrieval run.py must select best modules among user's target_modules input, and extracting its ids and scores. Finally, we could use hybrid modules properly.
However, in deployment Runner.run, do not use run.py functions. Because run.py functions merely contains optimization process, and it is super inefficient to use run.py at deployment feature.
Instead using run.py, at Runner extracts the best module name and module parameters from summary.csv, which made at run.py, and construct new config dictionary. With that dictionary, Runner can run whole modules one by one with selected parameters.

And here was the problem. As you know, hybrid modules must pass ids and scores, and that parameters made at run.py. But, at summary.csv, there was no ids and scores parameters, but target_modules, because summary.csv save user's input parameters as default. Hence, it was impossible to run hybrid modules at deployment Runner class.

So, what was the solution?
I swapped module parameters of hybrid modules at summary.csv in retrieval run.py. It means, I delete idsand scores at module params, and add target_modules and target_module_params as new hybrid module params.
And at the retrieval decorator, if there are no ids and scores parameters, I run other retrieval module with input target_modules and target_module_params.
In this way, you can run another retrieval module at retrieval node decorator, and obtain ids and scores, which is input of hybrid modules.
Since summary.csv module params saved with target_modules and target_module_params, we now can use hybrid modules at deployment!!

close #91

p.s. I thought it was great challenge for me, but it resolved pretty simple. I think the isolation of three parts (optimization, deployment runner, and modules) is really great and flexible. Maybe we can find a way to resolve some other weird methods, thanks to this isolation structure.

@vkehfdl1 vkehfdl1 enabled auto-merge (squash) February 3, 2024 19:36
Copy link
Contributor

@Eastsidegunn Eastsidegunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bwook00 bwook00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vkehfdl1 vkehfdl1 merged commit 4c3c356 into main Feb 3, 2024
2 checks passed
@vkehfdl1 vkehfdl1 deleted the Feature/#91 branch February 3, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Hybrid Retrieval "Deploy" Function.
3 participants