-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splade Encoding and Evaluation not working #150
base: main
Are you sure you want to change the base?
Conversation
arguments has changed, to be in sync with latest changes of the library
Hi @srikanthmalla, thank you again for helping us improve the codebase. I think the hard part is the indexing step for sparse representation. We need to use pyserini to index it properly. For search, we can create more consistent python script and internally use pyserini for search. I'm not very sure here, do you think it will make usage easier? |
Hi @MXueguang , For now, it might be fine to use pyserini, or even evaluation scripts from beir. But the current instruction in readme using pyserini is giving error on the last evaluation step
I also tried converting tsv and trec using this command , and evaluating using pyserini trec_eval command. This gives ndcg close to 0 in almost all cuts. It would be probably good idea to fix the readme instruction, doesn't matter if we are using beir or pyserini evaluation (which ever gives replicable results is more important). Finally, having self-contained repo would be amazing! For example, if we are adapting some functionality from beir or pyserini for evaluation, we could either put their particular version on a folder, or could use git submodule. The only problem with git submodule or just pip dependency is if the dependency repo is removed later in the future. These approaches help the current repo not broken if there are any updates in dependencies, both from instructions and also any code. Please let me know your thoughts, if I should add splade evaluation script using beir in the examples/splade folder? or you would take a look into pyserini instructions in readme of examples/splade subfolder. Thank you, |
Hi @MXueguang ,
Based on your suggestion from #149 to use splade example.
It has few issues:
I have fixed the first two already in the current pull request. To work on the last, I just want to check if we should create a function in searcher python file for sparse retriever output (It might be more consistent with the repo?), or keep the original index->retrieve->evaluate using pyserini from readme instruction. Please share your thoughts.
Thanks,
Srikanth