Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify retrieval #233

Merged
merged 13 commits into from
Mar 6, 2024
Merged

Simplify retrieval #233

merged 13 commits into from
Mar 6, 2024

Conversation

Muennighoff
Copy link
Contributor

@Muennighoff Muennighoff commented Feb 26, 2024

  • Remove beir dependency - This makes installation easier due to less dependencies
  • Remove outdated Retrieval evaluator - Afaik this evaluator was never used so it's confusing duplicate code
  • Remove beir multinode code - As outlined here this can be much easier done on the user side and the current implementation is too hacky
  • Add & adapt some classes from beir, mainly the hf dataset loader & the evaluator
  • Add all beir retrieval datasets to mteb on the hub and load them from there - This ensures consistency and allows people to inspect datasets much easier (previously they were loaded not from hf)

I tested with komninos that all English Retrieval results for the datasets changed remain the same. It would be great if all of you could take a look at the PR, as this is the biggest change to MTEB since release.

I also tested to make sure the BeIR-PL tasks & Korean Retrieval tasks still work (cc @kwojtasi @taeminlee)

Copy link
Contributor

@imenelydiaker imenelydiaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globally, everything is good. I just left a comment about a test and a function call.
I've also added some datasets revision IDs, I guess you'll change them if you update the datasets.

mteb/tasks/Retrieval/HotpotQAPLRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/ArguAnaPLRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/ArguAnaRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/CQADupstackEnglishRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/CQADupstackGamingRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/SciFactRetrieval.py Show resolved Hide resolved
mteb/tasks/Retrieval/TRECCOVIDPLRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/Touche2020Retrieval.py Outdated Show resolved Hide resolved
tests/test_RetrievalEvaluator.py Show resolved Hide resolved
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
mteb/tasks/Retrieval/ArguAnaRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/NQPLRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/NQPLRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/Retrieval/QuoraRetrieval.py Show resolved Hide resolved
mteb/tasks/Retrieval/SciFactRetrieval.py Show resolved Hide resolved
tests/test_RetrievalEvaluator.py Outdated Show resolved Hide resolved
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
@Muennighoff
Copy link
Contributor Author

@NouamaneTazi @loicmagne @tomaarsen Any thoughts? 😊

Copy link
Member

@loicmagne loicmagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great work!

mteb/evaluation/evaluators/RetrievalEvaluator.py Outdated Show resolved Hide resolved
@KennethEnevoldsen KennethEnevoldsen deleted the chore/simplifyretrieval branch July 19, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants