Feat: Evaluate missing languages #1584

Samoed · 2024-12-12T19:04:24Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Continue ideas from #1525

If a missing split is present in the existing results, return its results without running the evaluation.

KennethEnevoldsen

Looks good here only a few minor things

mteb/evaluation/MTEB.py

Samoed · 2024-12-13T12:10:13Z

I have a small issue. When I run mteb run -m intfloat/multilingual-e5-small -t Banking77Classification --overwrite with existing results, it doesn’t do anything and I don't understand why.

INFO:mteb.cli:Running with parameters: Namespace(model='intfloat/multilingual-e5-small', task_types=None, categories=None, tasks=['Banking77Classification'], languages=None, benchmarks=None, device=None, output_folder='results', verbosity=2, co2_tracker=False, eval_splits=None, model_revision=None, batch_size=None, overwrite=True, save_predictions=False, func=<function run at 0x7fc7bbf42440>)
INFO:mteb.models.sentence_transformer_wrapper:Model prompts will be overwritten with {'query': 'query: ', 'passage': 'passage: '}
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────────────────────────────────────────────── Selected tasks  ────────────────────────────────────────────────────────────────────────
Classification
    - Banking77Classification, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating Banking77Classification **********************

isaac-chung · 2024-12-15T09:53:19Z

I have a small issue. When I run mteb run -m intfloat/multilingual-e5-small -t Banking77Classification --overwrite with existing results, it doesn’t do anything and I don't understand why.

This runs fine for me in your branch.

Samoed · 2024-12-15T10:13:58Z

Hm, intresting

isaac-chung

Looks good. Just a few suggestions :)

tests/test_benchmark/mock_tasks.py

tests/test_evaluation/test_split_evaluation.py

mteb/evaluation/MTEB.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

README.md

* feat: add new arctic v2.0 models (#1574) * feat: add new arctic v2.0 models * chore: make lint * 1.24.0 Automatically generated by python-semantic-release * fix: Add namaa MrTydi reranking dataset (#1573) * Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> * Update tasks table * 1.24.1 Automatically generated by python-semantic-release * fix: Eval langs not correctly passed to monolingual tasks (#1587) * fix SouthAfricanLangClassification.py * add check for langs * lint * 1.24.2 Automatically generated by python-semantic-release * feat: Add ColBert (#1563) * feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc for Model2VecWrapper.__init__(...) * feat: add ColBERTWrapper to models & add ColBERTv2 * fix: resolve issues * fix: resolve issues * Update README.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * README.md: rm subset * doc: update example for Late Interaction * get colbert running without errors * fix: pass is_query to pylate * fix: max_sim add pad_sequence * feat: integrate Jinja templates for ColBERTv2 and add model prompt handling * feat: add revision & prompt_name * doc: pad_sequence * rm TODO jina colbert v2 * doc: warning: higher resource usage for MaxSim --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.25.0 Automatically generated by python-semantic-release * doc: colbert add score_function & doc section (#1592) * doc: colbert add score_function & doc section * doc: Update README.md Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * doc: Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Feat: add support for scoring function (#1594) * add support for scoring function * lint * move similarity to wrapper * remove score function * lint * remove from InstructionRetrievalEvaluator * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * remove score function from README.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Add new models nvidia, gte, linq (#1436) * Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Leaderboard: Refined plots (#1601) * Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe * fix: Leaderboard refinements (#1603) * Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab * 1.25.1 Automatically generated by python-semantic-release * Feat: Use similarity scores if available (#1602) * Use similarity scores if available * lint * Add NanoBEIR Datasets (#1588) * add NanoClimateFeverRetrieval task, still requires some debugging * move task to correct place in init file * add all Nano datasets and results * format code * Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * pin revision to commit and add datasets to benchmark.py * create new benchmark for NanoBEIR * add revision when loading datasets * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Update tasks table * Feat: Evaluate missing languages (#1584) * init * fix tests * update mock retrieval * update tests * use subsets instead of langs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * add to readme * rename subset in readme --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Add IBM Granite Embedding Models (#1613) * add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta * fix: disable co2_tracker for API models (#1614) * 1.25.2 Automatically generated by python-semantic-release * fix: set `use_instructions` to True in models using prompts (#1616) feat: set `use_instructions` to True in models using prompts * 1.25.3 Automatically generated by python-semantic-release * update RetrievalEvaluator.py * update imports * update imports and metadata * fix tests * fix tests * fix output path for retrieval * fix similarity function --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Omar Elshehy <41394057+omarelshehy@users.noreply.github.com> Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: KGupta10 <92774828+KGupta10@users.noreply.github.com> Co-authored-by: Aashka Trivedi <aashka.trivedi@gmail.com>

Samoed added 2 commits December 11, 2024 22:51

init

0c0c587

fix tests

3e51c61

Samoed requested review from KennethEnevoldsen and isaac-chung December 12, 2024 19:04

update mock retrieval

d643976

KennethEnevoldsen approved these changes Dec 12, 2024

View reviewed changes

mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved

mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved

mteb/evaluation/MTEB.py Outdated Show resolved Hide resolved

update tests

fb2a4c0

use subsets instead of langs

eabb7f8

isaac-chung reviewed Dec 15, 2024

View reviewed changes

Samoed and others added 3 commits December 15, 2024 17:42

Apply suggestions from code review

911d951

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix tests

b49f252

add to readme

86c7b51

isaac-chung reviewed Dec 15, 2024

View reviewed changes

README.md Show resolved Hide resolved

rename subset in readme

f8f89b7

KennethEnevoldsen reviewed Dec 15, 2024

View reviewed changes

README.md Show resolved Hide resolved

isaac-chung approved these changes Dec 18, 2024

View reviewed changes

isaac-chung merged commit 48cb97d into main Dec 18, 2024
10 checks passed

isaac-chung deleted the check_lang_in_results branch December 18, 2024 18:28

Samoed mentioned this pull request Dec 20, 2024

fix: override existing results #1617

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Evaluate missing languages #1584

Feat: Evaluate missing languages #1584

Samoed commented Dec 12, 2024 •

edited

Loading

KennethEnevoldsen left a comment

Samoed commented Dec 13, 2024

isaac-chung commented Dec 15, 2024

Samoed commented Dec 15, 2024

isaac-chung left a comment

Feat: Evaluate missing languages #1584

Feat: Evaluate missing languages #1584

Conversation

Samoed commented Dec 12, 2024 • edited Loading

Checklist

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Samoed commented Dec 13, 2024

isaac-chung commented Dec 15, 2024

Samoed commented Dec 15, 2024

isaac-chung left a comment

Choose a reason for hiding this comment

Samoed commented Dec 12, 2024 •

edited

Loading