Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splade Encoding and Evaluation not working #150

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

srikanthmalla
Copy link
Contributor

Hi @MXueguang ,
Based on your suggestion from #149 to use splade example.

It has few issues:

  • 1. encode_splade.py is outdated based on new arguments and module paths in the latest library updates.
  • 2. Readme instruction for encoding is outdated, for passing right argument names.
  • 3. Need to verify if the evaluation scripts works and replicate the evaluation result on atleast one splade version.

I have fixed the first two already in the current pull request. To work on the last, I just want to check if we should create a function in searcher python file for sparse retriever output (It might be more consistent with the repo?), or keep the original index->retrieve->evaluate using pyserini from readme instruction. Please share your thoughts.

Thanks,
Srikanth

@MXueguang
Copy link
Contributor

Hi @srikanthmalla, thank you again for helping us improve the codebase. I think the hard part is the indexing step for sparse representation. We need to use pyserini to index it properly. For search, we can create more consistent python script and internally use pyserini for search. I'm not very sure here, do you think it will make usage easier?

@srikanthmalla
Copy link
Contributor Author

srikanthmalla commented Sep 9, 2024

Hi @MXueguang ,
I am getting below results using evaluation script from beir repo on arguana dataset:
ndcg@10: 0.525 (close to the reported number in the paper) and other metrics that are not reported as well (map@10: 0.435 and recall@10 : 0.813).

For now, it might be fine to use pyserini, or even evaluation scripts from beir. But the current instruction in readme using pyserini is giving error on the last evaluation step python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset splade_results.tsv or python -m pyserini.eval.msmarco_passage_eval beir-v1.0.0-arguana-test splade_results.tsv:

Running command: ['python', '/home/user/.cache/pyserini/eval/msmarco_passage_eval.py', '/home/user/.cache/anserini/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt', 'splade_results.tsv']
Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 27, in load_reference_from_stream
    qid = int(l[0])
ValueError: invalid literal for int() with base 10: 'test-environment-aeghhgwpe-pro02a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 184, in <module>
    main()
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 173, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 157, in compute_metrics_from_files
    qids_to_relevant_passageids = load_reference(path_to_reference)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 43, in load_reference
    qids_to_relevant_passageids = load_reference_from_stream(f)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 34, in load_reference_from_stream
    raise IOError('\"%s\" is not valid format' % l)
OSError: "['test-environment-aeghhgwpe-pro02a', '0', 'test-environment-aeghhgwpe-pro02b', '1']" is not valid format

I also tried converting tsv and trec using this command , and evaluating using pyserini trec_eval command. This gives ndcg close to 0 in almost all cuts.

It would be probably good idea to fix the readme instruction, doesn't matter if we are using beir or pyserini evaluation (which ever gives replicable results is more important).

Finally, having self-contained repo would be amazing! For example, if we are adapting some functionality from beir or pyserini for evaluation, we could either put their particular version on a folder, or could use git submodule. The only problem with git submodule or just pip dependency is if the dependency repo is removed later in the future. These approaches help the current repo not broken if there are any updates in dependencies, both from instructions and also any code. Please let me know your thoughts, if I should add splade evaluation script using beir in the examples/splade folder? or you would take a look into pyserini instructions in readme of examples/splade subfolder.

Thank you,
Srikanth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants