fix: Add namaa MrTydi reranking dataset #1573

omarelshehy · 2024-12-09T17:38:43Z

Why this dataset:

1 - Add to the reranking tasks exclusively for arabic
2 - Utilize the test dataset for MrTydi with generated and human-evaluated negatives.

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- cross-encoder/ms-marco-MiniLM-L-12-v2
- cross-encoder/stsb-TinyBERT-L-4
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Samoed · 2024-12-09T18:04:43Z

Did your dataset add new data to the original MrTidy? The original MrTidy is already included in MTEB. Also, could you provide results for these tasks to ensure it's working correctly? It seems like the data is being loaded in a different format than expected

omarelshehy · 2024-12-09T18:16:21Z

I might be mistaken, but the Mrtydi dataset was included there for retrieval and not reranking. We basically took the test dataset from MrTydi and added 4-5 negatives to each query and positive (which the original doesn't have). For the formatting I relied on similar Reranking dataset structures. Here is also the results of the two models in the PR description
NamaaMrTydiReranking_ms-marco-MiniLM.json
NamaaMrTydiReranking_stsb_TinyBERT.json

Samoed · 2024-12-09T18:28:34Z

Ah, yes. You are right

KennethEnevoldsen

The metadata seems to be lacking a bit, I have suggested some updates.

mteb/tasks/Reranking/ara/NamaaMrTydiReranking.py

* feat: add new arctic v2.0 models (#1574) * feat: add new arctic v2.0 models * chore: make lint * 1.24.0 Automatically generated by python-semantic-release * fix: Add namaa MrTydi reranking dataset (#1573) * Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> * Update tasks table * 1.24.1 Automatically generated by python-semantic-release * fix: Eval langs not correctly passed to monolingual tasks (#1587) * fix SouthAfricanLangClassification.py * add check for langs * lint * 1.24.2 Automatically generated by python-semantic-release * feat: Add ColBert (#1563) * feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc for Model2VecWrapper.__init__(...) * feat: add ColBERTWrapper to models & add ColBERTv2 * fix: resolve issues * fix: resolve issues * Update README.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * README.md: rm subset * doc: update example for Late Interaction * get colbert running without errors * fix: pass is_query to pylate * fix: max_sim add pad_sequence * feat: integrate Jinja templates for ColBERTv2 and add model prompt handling * feat: add revision & prompt_name * doc: pad_sequence * rm TODO jina colbert v2 * doc: warning: higher resource usage for MaxSim --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.25.0 Automatically generated by python-semantic-release * doc: colbert add score_function & doc section (#1592) * doc: colbert add score_function & doc section * doc: Update README.md Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * doc: Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Feat: add support for scoring function (#1594) * add support for scoring function * lint * move similarity to wrapper * remove score function * lint * remove from InstructionRetrievalEvaluator * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * remove score function from README.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Add new models nvidia, gte, linq (#1436) * Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Leaderboard: Refined plots (#1601) * Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe * fix: Leaderboard refinements (#1603) * Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab * 1.25.1 Automatically generated by python-semantic-release * Feat: Use similarity scores if available (#1602) * Use similarity scores if available * lint * Add NanoBEIR Datasets (#1588) * add NanoClimateFeverRetrieval task, still requires some debugging * move task to correct place in init file * add all Nano datasets and results * format code * Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * pin revision to commit and add datasets to benchmark.py * create new benchmark for NanoBEIR * add revision when loading datasets * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Update tasks table * Feat: Evaluate missing languages (#1584) * init * fix tests * update mock retrieval * update tests * use subsets instead of langs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * add to readme * rename subset in readme --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Add IBM Granite Embedding Models (#1613) * add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta * fix: disable co2_tracker for API models (#1614) * 1.25.2 Automatically generated by python-semantic-release * fix: set `use_instructions` to True in models using prompts (#1616) feat: set `use_instructions` to True in models using prompts * 1.25.3 Automatically generated by python-semantic-release * update RetrievalEvaluator.py * update imports * update imports and metadata * fix tests * fix tests * fix output path for retrieval * fix similarity function --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Omar Elshehy <41394057+omarelshehy@users.noreply.github.com> Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: KGupta10 <92774828+KGupta10@users.noreply.github.com> Co-authored-by: Aashka Trivedi <aashka.trivedi@gmail.com>

Omar Elshehy added 3 commits December 9, 2024 15:20

Add dataset class and file requirements

6455488

pass tests

df3d18d

make lint changes

6ebbcf2

omarelshehy marked this pull request as ready for review December 9, 2024 17:39

KennethEnevoldsen requested changes Dec 10, 2024

View reviewed changes

adjust meta data and remove load_data

936f4f7

omarelshehy requested a review from Samoed December 11, 2024 21:49

KennethEnevoldsen approved these changes Dec 11, 2024

View reviewed changes

KennethEnevoldsen changed the title ~~Add namaa MrTydi reranking dataset~~ fix: Add namaa MrTydi reranking dataset Dec 11, 2024

KennethEnevoldsen enabled auto-merge (squash) December 11, 2024 23:09

KennethEnevoldsen merged commit 7b9b3c9 into embeddings-benchmark:main Dec 11, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add namaa MrTydi reranking dataset #1573

fix: Add namaa MrTydi reranking dataset #1573

omarelshehy commented Dec 9, 2024

Samoed commented Dec 9, 2024

omarelshehy commented Dec 9, 2024 •

edited

Loading

Samoed commented Dec 9, 2024

KennethEnevoldsen left a comment

fix: Add namaa MrTydi reranking dataset #1573

fix: Add namaa MrTydi reranking dataset #1573

Conversation

omarelshehy commented Dec 9, 2024

Why this dataset:

Checklist

Adding datasets checklist

Samoed commented Dec 9, 2024

omarelshehy commented Dec 9, 2024 • edited Loading

Samoed commented Dec 9, 2024

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

omarelshehy commented Dec 9, 2024 •

edited

Loading