BREAKING: v2.0.0 #1433

KennethEnevoldsen · 2024-11-11T09:24:01Z

This is a work-in-progress branch which will be the release of MTEB v2.0.0!

Features:

Added evaluation of image embedding (MIEB, not merged in yet)
Improved handling of seeds (can still be improved by Avoid using global seeds #942)
Major updates to the leaderboard
Evaluators ambiguity: class/module #1124
New benchmark interface #1272
Remove encode_corpus and encode_queries and implement a "document" class #1284
Consolidate Retrieval/Reranking/Instruction Variants #1359

@x-tabdeveloping, @orionw, @isaac-chung, @Samoed, @gowitheflow-1998 etc. please make PR to this when relevant (MIEB still goes it its own branch but will try to merge it in here)

* update * merged retrieval; working * update tasks; working multilingual * everything working except instructions * working instructions; just need cleanup * add metadata for all but MindSmall * faster evaluation; mindsmall can compute in reasonable time * fix bad merge of docs * lint * fix test * qa * updated mindsmall * lint * fix debug * Update mteb/abstasks/dataloaders.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

…into v2.0.0

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * base * sync with main --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com>

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

* multilingual loader * lint

* add citations * fix typo

* add code for comupting number of qrels * add stats fever hotpotqa msmarco topiocqa * miracl mrtidy * multilongdoc miracl reranking * add multi eurlex * fix tests for descriptive stats * fix tests --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* add code for comupting number of qrels * BibleNLPBitextMining descriptive stats added * SwissJudgementClassification descriptive stats added * VoyageMMarcoReranking descriptive stats added * WebLINXCandidatesReranking descriptive stats added * MultiEURLEXMultilabelClassification descriptive stats added * MIRACLReranking descriptive stats added * MindSmallReranking descriptive stats added * updated test_TaskMetadata * fix test --------- Co-authored-by: Imene Kerboua <imenelydia.kr@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>

* fix bright loader * lint * fix comment

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Daniel Buades Marcos <daniel@buad.es> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * add cure statistics --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Napuh <55241721+Napuh@users.noreply.github.com> Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com>

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

* fix FilipinoHateSpeechClassification * update tests

* init * find all wierd repos * move to mteb WikipediaRetrievalMultilingual * add base upload utils * retrieval, classification, bitextmining * test retrieval * test retrieval * test task uploaded * update tasks * working version * remove comments * lint * move upload * fix tests * fix test * move upload to task * Update mteb/tasks/Retrieval/multilingual/WikipediaRetrievalMultilingual.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * fix: hatespeech filipino (#1522) * fix FilipinoHateSpeechClassification * update tests * lint --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Daniel Buades Marcos <daniel@buad.es> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * fix: Leaderboard demo data loading (#1507) * Made get_scores error tolerant * Added join_revisions, made get_scores failsafe * Fetching metadata fixed fr HF models * Added failsafe metadata fetching to leaderboard code * Added revision joining to leaderboard app * fix * Only show models that have metadata, when filter_models is called * Ran linting * 1.20.2 Automatically generated by python-semantic-release * fix: leaderboard only shows models that have ModelMeta (#1508) Filtering for models that have metadata * 1.20.3 Automatically generated by python-semantic-release * fix: align readme with current mteb (#1493) * align readme with current mteb * align with mieb branch * fix test * 1.20.4 Automatically generated by python-semantic-release * docs: Add lang family mapping and map to task table (#1486) * add lang family mapping and map to task table * make lint * add back some unclassified lang codes * Update tasks table * fix: Ensure that models match the names on embedding-benchmarks/results (#1519) * 1.20.5 Automatically generated by python-semantic-release * fix: Adding missing metadata on models and mathcing names up with the results repo (#1528) * Added Voyage 3 models * Added correct metadata to Cohere models and matched names with the results repo * 1.20.6 Automatically generated by python-semantic-release * feat: Evaluate missing splits (#1525) * fix: evaluate missing splits (#1268) * implement partial evaluation for missing splits * lint * requested changes done from scratch * test for missing split evaluation added * uncomment test * lint * avoid circular import * use TaskResult * skip tests for now --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * got test_all_splits_evaluated passing * tests passing * address review comments * make lint * handle None cases for kg_co2_emissions * use new results info --------- Co-authored-by: Thivyanth <thivyanth2004@gmail.com> * 1.21.0 Automatically generated by python-semantic-release * fix: Correct typos superseeded -> superseded (#1532) fix typo -> superseded * 1.21.1 Automatically generated by python-semantic-release * fix: Task load data error for SICK-BR-STS and XStance (#1534) * fix task load data for two tasks * correct dataset keys * 1.21.2 Automatically generated by python-semantic-release * fix: Proprietary models now get correctly shown in leaderboard (#1530) * Fixed showing proprietary models in leaderboard * Added links to all OpenAI models * Fixed table formatting issues * Bumped Gradio version * 1.21.3 Automatically generated by python-semantic-release * docs: Add Model Meta parameters and metadata (#1536) * add multi_qa_MiniLM_L6_cos_v1 model meta * add all_mpnet_base_v2 * add parameters to model meta * make lint * add extra params to meta * fix: add more model meta (jina, e5) (#1537) * add e5 model meta * address review comments * 1.21.4 Automatically generated by python-semantic-release * Add cohere models (#1538) * fix: bug cohere names * format * fix: add nomic models (#1543) #1515 * fix: Added all-minilm-l12-v2 (#1542) #1515 * fix: Added arctic models (#1541) #1515 * fix: add sentence trimming to OpenAIWrapper (#1526) * fix: add sentence trimming to OpenAIWrapper * fix: import tiktoken library inside encode function * fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name * fix: pass tokenizer_name, max_tokens to loader * fix: make tokenizer_name None for default * fix: delete changes for ModelMeta * fix: fix revision to 2 for OpenAI models * fix: add docstring for OpenAIWrapper * fix: lint * feat: add openai optional dependency set * fix: add sleep for too many requests * fix: add lint * fix: delete evaluate file * 1.21.5 Automatically generated by python-semantic-release * fix: Fixed metadata errors (#1547) * 1.21.6 Automatically generated by python-semantic-release * fix: remove curev1 from multlingual (#1552) Seems like it was added here: 1cc6c9e * 1.21.7 Automatically generated by python-semantic-release * fix: Add Model2vec (#1546) * Added Model2Vec wrapper * Added Model2vec models * Added model2vec models to registry * Added model2vec as a dependency * Ran linting * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Added adapted_from and superseeded_by to model2vec models. * Added missing import * Moved pyproject.toml to optional dependencies * Fixed typos * Added import error and changed model to model_name * Added Numpy to frameworks * Added Numpy to frameworks * Corrected false info on model2vec models * Replaced np.inf with maxint * Update mteb/models/model2vec_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Added option to have infinite max tokens, added it to Model2vec --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554) * Removed train and dev from eval splits on HotpotQA * Removed dev from eval splits on DBPedia * Made task_results validation more permissive * Readded exception in get_score * Ran linting * 1.21.8 Automatically generated by python-semantic-release * docs: Correction of SICK-R metadata (#1558) * Correction of SICK-R metadata * Correction of SICK-R metadata --------- Co-authored-by: rposwiata <rposwiata@opi.org.pl> * feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562) * fix: google_models batching and prompt * feat: add text-embedding-005 and text-multilingual-embedding-002 * chore: `make lint` errors * fix: address PR comments * 1.22.0 Automatically generated by python-semantic-release * fix(bm25s): search implementation (#1566) fix: bm25s implementation * 1.22.1 Automatically generated by python-semantic-release * docs: Fix dependency library name for bm25s (#1568) * fix: bm25s implementation * correct library name --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> * fix: Add training dataset to model meta (#1561) * fix: Add training dataset to model meta Adresses #1556 * Added docs * format * feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564) * feat: batch requests to cohere models * fix: use correct task_type * feat: use tqdm with openai * fix: explicitely set `show_progress_bar` to False * fix(publichealth-qa): ignore rows with `None` values in `question` or `answer` (#1565) * 1.23.0 Automatically generated by python-semantic-release * fix wongnai * update inits * fix tests * lint * update imports * fix tests * lint --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Napuh <55241721+Napuh@users.noreply.github.com> Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Thivyanth <thivyanth2004@gmail.com> Co-authored-by: Youngjoon Jang <82500463+yjoonjang@users.noreply.github.com> Co-authored-by: Rafał Poświata <rafalposwiata@gmail.com>

# Conflicts: # docs/tasks.md # mteb/abstasks/AbsTaskClassification.py # mteb/abstasks/AbsTaskClusteringFast.py # mteb/abstasks/AbsTaskInstructionRetrieval.py # mteb/abstasks/AbsTaskMultilabelClassification.py # mteb/abstasks/AbsTaskPairClassification.py # mteb/abstasks/AbsTaskReranking.py # mteb/abstasks/AbsTaskRetrieval.py # mteb/abstasks/AbsTaskSTS.py # mteb/descriptive_stats/InstructionRetrieval/Core17InstructionRetrieval.json # mteb/descriptive_stats/MultilabelClassification/MultiEURLEXMultilabelClassification.json # mteb/descriptive_stats/Reranking/AskUbuntuDupQuestions.json # mteb/descriptive_stats/Reranking/ESCIReranking.json # mteb/descriptive_stats/Reranking/WikipediaRerankingMultilingual.json # mteb/descriptive_stats/Retrieval/AppsRetrieval.json # mteb/descriptive_stats/Retrieval/BelebeleRetrieval.json # mteb/descriptive_stats/Retrieval/COIRCodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeEditSearchRetrieval.json # mteb/descriptive_stats/Retrieval/CodeFeedbackMT.json # mteb/descriptive_stats/Retrieval/CodeFeedbackST.json # mteb/descriptive_stats/Retrieval/CodeSearchNetCCRetrieval.json # mteb/descriptive_stats/Retrieval/CodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeTransOceanContest.json # mteb/descriptive_stats/Retrieval/CodeTransOceanDL.json # mteb/descriptive_stats/Retrieval/CosQA.json # mteb/descriptive_stats/Retrieval/JaqketRetrieval.json # mteb/descriptive_stats/Retrieval/NFCorpus.json # mteb/descriptive_stats/Retrieval/StackOverflowQA.json # mteb/descriptive_stats/Retrieval/SyntheticText2SQL.json # mteb/descriptive_stats/Retrieval/Touche2020.json # mteb/descriptive_stats/Retrieval/Touche2020Retrieval.v3.json # mteb/descriptive_stats/Retrieval/mFollowIRCrossLingualInstructionRetrieval.json # mteb/descriptive_stats/Retrieval/mFollowIRInstructionRetrieval.json # mteb/evaluation/MTEB.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/leaderboard/app.py # mteb/leaderboard/figures.py # mteb/leaderboard/table.py # mteb/model_meta.py # mteb/models/arctic_models.py # mteb/models/e5_models.py # mteb/models/nomic_models.py # mteb/models/overview.py # mteb/models/sentence_transformers_models.py # mteb/tasks/Reranking/zho/CMTEBReranking.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/STS/por/SickBrSTS.py # pyproject.toml # tests/test_benchmark/mock_tasks.py

* sort logos, add mkdocs outline, add index page * Added tons of documentation * Added some more docs to abstask * reduced docs to only include API docs for now * fixed import hell * Fixed more nasty import to get docs to work * API docs work! * fixed link * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * format --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: reorder argument for mteb.get_tasks This should make the function more intuitive to use * typo --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* fix: Make deduplication in PairClassificationEvaluator stable * remove prompt type * remove prompt type missed one --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* feat: add new arctic v2.0 models (#1574) * feat: add new arctic v2.0 models * chore: make lint * 1.24.0 Automatically generated by python-semantic-release * fix: Add namaa MrTydi reranking dataset (#1573) * Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> * Update tasks table * 1.24.1 Automatically generated by python-semantic-release * fix: Eval langs not correctly passed to monolingual tasks (#1587) * fix SouthAfricanLangClassification.py * add check for langs * lint * 1.24.2 Automatically generated by python-semantic-release * feat: Add ColBert (#1563) * feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc for Model2VecWrapper.__init__(...) * feat: add ColBERTWrapper to models & add ColBERTv2 * fix: resolve issues * fix: resolve issues * Update README.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * README.md: rm subset * doc: update example for Late Interaction * get colbert running without errors * fix: pass is_query to pylate * fix: max_sim add pad_sequence * feat: integrate Jinja templates for ColBERTv2 and add model prompt handling * feat: add revision & prompt_name * doc: pad_sequence * rm TODO jina colbert v2 * doc: warning: higher resource usage for MaxSim --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * 1.25.0 Automatically generated by python-semantic-release * doc: colbert add score_function & doc section (#1592) * doc: colbert add score_function & doc section * doc: Update README.md Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * doc: Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Feat: add support for scoring function (#1594) * add support for scoring function * lint * move similarity to wrapper * remove score function * lint * remove from InstructionRetrievalEvaluator * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * remove score function from README.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Add new models nvidia, gte, linq (#1436) * Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Leaderboard: Refined plots (#1601) * Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe * fix: Leaderboard refinements (#1603) * Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab * 1.25.1 Automatically generated by python-semantic-release * Feat: Use similarity scores if available (#1602) * Use similarity scores if available * lint * Add NanoBEIR Datasets (#1588) * add NanoClimateFeverRetrieval task, still requires some debugging * move task to correct place in init file * add all Nano datasets and results * format code * Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * pin revision to commit and add datasets to benchmark.py * create new benchmark for NanoBEIR * add revision when loading datasets * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: isaac-chung <chungisaac1217@gmail.com> * Update tasks table * Feat: Evaluate missing languages (#1584) * init * fix tests * update mock retrieval * update tests * use subsets instead of langs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * add to readme * rename subset in readme --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Add IBM Granite Embedding Models (#1613) * add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta * fix: disable co2_tracker for API models (#1614) * 1.25.2 Automatically generated by python-semantic-release * fix: set `use_instructions` to True in models using prompts (#1616) feat: set `use_instructions` to True in models using prompts * 1.25.3 Automatically generated by python-semantic-release * update RetrievalEvaluator.py * update imports * update imports and metadata * fix tests * fix tests * fix output path for retrieval * fix similarity function --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Omar Elshehy <41394057+omarelshehy@users.noreply.github.com> Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com> Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: KGupta10 <92774828+KGupta10@users.noreply.github.com> Co-authored-by: Aashka Trivedi <aashka.trivedi@gmail.com>

# Conflicts: # mteb/abstasks/AbsTaskInstructionRetrieval.py # mteb/evaluation/MTEB.py # mteb/evaluation/evaluators/InstructionRetrievalEvaluator.py # mteb/evaluation/evaluators/RerankingEvaluator.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/models/arctic_models.py # mteb/models/bge_models.py # mteb/models/gte_models.py # mteb/models/ru_sentence_models.py # mteb/models/uae_models.py # mteb/tasks/Reranking/__init__.py # mteb/tasks/Retrieval/__init__.py # tests/test_TaskMetadata.py # tests/test_benchmark/mock_tasks.py

…1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies

fix: Ensure seed is based on RNG State (#1193)

e2520df

KennethEnevoldsen added this to the v2.0.0 milestone Nov 11, 2024

isaac-chung marked this pull request as draft November 11, 2024 09:27

KennethEnevoldsen mentioned this pull request Nov 13, 2024

Consolidate Retrieval/Reranking/Instruction Variants #1359

Merged

1 task

orionw and others added 5 commits November 13, 2024 11:30

fix: Unsure TaskResults can handle runtime and version being unspecified

2a8a370

Merge branch 'v2.0.0' of https://github.com/embeddings-benchmark/mteb …

dea2b77

…into v2.0.0

fix: remove NaN handling for retrieval

23d6cb2

Merge branch 'main' into v2.0.0

8868cd4

Samoed mentioned this pull request Nov 14, 2024

fix: Count unique texts, data leaks in calculate metrics #1438

Merged

2 tasks

Samoed and others added 20 commits November 14, 2024 21:26

feat: enable codecarbon by default (#1428)

70a3ff2

* enable codecarbon by default * lint * update flag * add allow_multiple_runs param * make lint * add warning * lint * negate the flag --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Add decriptive stat almost to all datasets (#1466)

0e9b6fd

* run tasks * remove test script * lint * remove cache * fix sickbrsts * fix tests * add datasets

fix: Fix test for empty descriptive tasks (#1413)

0a5bedb

* fix test * skip mock * add message to assert * fix test * lint * fix tests * upd tests * update descriptive stats files * add stat to speed

fix: pin datasets version <3.0.0 (#1471)

6da2a1a

feat: Multilingual retrieval loader (#1473)

a27de33

* multilingual loader * lint

fix: add citations to ModelMeta (#1477)

0df0210

* add citations * fix typo

fix: Fix BrightRetrieval calculate stats (#1484)

99247b2

* fix bright loader * lint * fix comment

Fix: retrieval stats (#1496)

6383950

* fix bright loader * lint * fix comment * fix stats * fix retrieval stats * update stats * add rest of the stat * move bach code * fix docs * lint

fix: hatespeech filipino (#1522)

d54fb75

* fix FilipinoHateSpeechClassification * update tests

fix: reorder argument for mteb.get_tasks (#1597)

6a8e188

* fix: reorder argument for mteb.get_tasks This should make the function more intuitive to use * typo --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

fix: Make deduplication in PairClassificationEvaluator stable (#1315)

d6130ad

* fix: Make deduplication in PairClassificationEvaluator stable * remove prompt type * remove prompt type missed one --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

Samoed and others added 2 commits December 22, 2024 14:30

fix: [V2] Update datasets wich can't be loaded with datasets>=3.0 (#…

71c46ea

…1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING: v2.0.0 #1433

BREAKING: v2.0.0 #1433

KennethEnevoldsen commented Nov 11, 2024 •

edited by orionw

Loading

BREAKING: v2.0.0 #1433

Are you sure you want to change the base?

BREAKING: v2.0.0 #1433

Conversation

KennethEnevoldsen commented Nov 11, 2024 • edited by orionw Loading

KennethEnevoldsen commented Nov 11, 2024 •

edited by orionw

Loading