Splade Encoding and Evaluation not working #150

srikanthmalla · 2024-09-04T17:34:34Z

Hi @MXueguang ,
Based on your suggestion from #149 to use splade example.

It has few issues:

1. encode_splade.py is outdated based on new arguments and module paths in the latest library updates.
2. Readme instruction for encoding is outdated, for passing right argument names.
3. Need to verify if the evaluation scripts works and replicate the evaluation result on atleast one splade version.

I have fixed the first two already in the current pull request. To work on the last, I just want to check if we should create a function in searcher python file for sparse retriever output (It might be more consistent with the repo?), or keep the original index->retrieve->evaluate using pyserini from readme instruction. Please share your thoughts.

Thanks,
Srikanth

arguments has changed, to be in sync with latest changes of the library

MXueguang · 2024-09-06T19:58:11Z

Hi @srikanthmalla, thank you again for helping us improve the codebase. I think the hard part is the indexing step for sparse representation. We need to use pyserini to index it properly. For search, we can create more consistent python script and internally use pyserini for search. I'm not very sure here, do you think it will make usage easier?

srikanthmalla · 2024-09-09T07:23:54Z

Hi @MXueguang ,
I am getting below results using evaluation script from beir repo on arguana dataset:
ndcg@10: 0.525 (close to the reported number in the paper) and other metrics that are not reported as well (map@10: 0.435 and recall@10 : 0.813).

For now, it might be fine to use pyserini, or even evaluation scripts from beir. But the current instruction in readme using pyserini is giving error on the last evaluation step python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset splade_results.tsv or python -m pyserini.eval.msmarco_passage_eval beir-v1.0.0-arguana-test splade_results.tsv:

Running command: ['python', '/home/user/.cache/pyserini/eval/msmarco_passage_eval.py', '/home/user/.cache/anserini/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt', 'splade_results.tsv']
Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 27, in load_reference_from_stream
    qid = int(l[0])
ValueError: invalid literal for int() with base 10: 'test-environment-aeghhgwpe-pro02a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 184, in <module>
    main()
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 173, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 157, in compute_metrics_from_files
    qids_to_relevant_passageids = load_reference(path_to_reference)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 43, in load_reference
    qids_to_relevant_passageids = load_reference_from_stream(f)
  File "/home/user/.cache/pyserini/eval/msmarco_passage_eval.py", line 34, in load_reference_from_stream
    raise IOError('\"%s\" is not valid format' % l)
OSError: "['test-environment-aeghhgwpe-pro02a', '0', 'test-environment-aeghhgwpe-pro02b', '1']" is not valid format

I also tried converting tsv and trec using this command , and evaluating using pyserini trec_eval command. This gives ndcg close to 0 in almost all cuts.

It would be probably good idea to fix the readme instruction, doesn't matter if we are using beir or pyserini evaluation (which ever gives replicable results is more important).

Finally, having self-contained repo would be amazing! For example, if we are adapting some functionality from beir or pyserini for evaluation, we could either put their particular version on a folder, or could use git submodule. The only problem with git submodule or just pip dependency is if the dependency repo is removed later in the future. These approaches help the current repo not broken if there are any updates in dependencies, both from instructions and also any code. Please let me know your thoughts, if I should add splade evaluation script using beir in the examples/splade folder? or you would take a look into pyserini instructions in readme of examples/splade subfolder.

Thank you,
Srikanth

srikanthmalla added 3 commits September 4, 2024 08:08

update encode_splade.py based on latest changes args and module paths

19936ea

updating splade_encode instructions in README.md

4518808

arguments has changed, to be in sync with latest changes of the library

readme for splade query encoding updated

65b0fd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splade Encoding and Evaluation not working #150

Splade Encoding and Evaluation not working #150

srikanthmalla commented Sep 4, 2024

MXueguang commented Sep 6, 2024

srikanthmalla commented Sep 9, 2024 •

edited

Loading

Splade Encoding and Evaluation not working #150

Are you sure you want to change the base?

Splade Encoding and Evaluation not working #150

Conversation

srikanthmalla commented Sep 4, 2024

MXueguang commented Sep 6, 2024

srikanthmalla commented Sep 9, 2024 • edited Loading

srikanthmalla commented Sep 9, 2024 •

edited

Loading