[V2] Update v2 #1618

Samoed · 2024-12-21T10:16:55Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.
Updated tasks and their metadata
Updated the score function in AbsTaskRetrieval

* feat: add new arctic v2.0 models * chore: make lint

Automatically generated by python-semantic-release

* Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local>

Automatically generated by python-semantic-release

* fix SouthAfricanLangClassification.py * add check for langs * lint

Automatically generated by python-semantic-release

* feat: add max_sim operator for IR tasks to support multi-vector models * docs: add doc for Model2VecWrapper.__init__(...) * feat: add ColBERTWrapper to models & add ColBERTv2 * fix: resolve issues * fix: resolve issues * Update README.md Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * README.md: rm subset * doc: update example for Late Interaction * get colbert running without errors * fix: pass is_query to pylate * fix: max_sim add pad_sequence * feat: integrate Jinja templates for ColBERTv2 and add model prompt handling * feat: add revision & prompt_name * doc: pad_sequence * rm TODO jina colbert v2 * doc: warning: higher resource usage for MaxSim --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

Automatically generated by python-semantic-release

* doc: colbert add score_function & doc section * doc: Update README.md Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * doc: Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* add support for scoring function * lint * move similarity to wrapper * remove score function * lint * remove from InstructionRetrievalEvaluator * Update mteb/evaluation/evaluators/RetrievalEvaluator.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * remove score function from README.md --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe

* Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab

Automatically generated by python-semantic-release

* Use similarity scores if available * lint

* add NanoClimateFeverRetrieval task, still requires some debugging * move task to correct place in init file * add all Nano datasets and results * format code * Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * pin revision to commit and add datasets to benchmark.py * create new benchmark for NanoBEIR * add revision when loading datasets * lint --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* init * fix tests * update mock retrieval * update tests * use subsets instead of langs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * fix tests * add to readme * rename subset in readme --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta

Automatically generated by python-semantic-release

feat: set `use_instructions` to True in models using prompts

Automatically generated by python-semantic-release

# Conflicts: # docs/tasks.md # mteb/abstasks/AbsTaskInstructionRetrieval.py # mteb/evaluation/MTEB.py # mteb/evaluation/evaluators/InstructionRetrievalEvaluator.py # mteb/evaluation/evaluators/RerankingEvaluator.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/model_meta.py # mteb/models/arctic_models.py # mteb/models/bge_models.py # mteb/models/ru_sentence_models.py # mteb/models/uae_models.py # mteb/tasks/Reranking/__init__.py # mteb/tasks/Retrieval/__init__.py # tests/test_TaskMetadata.py

orionw · 2024-12-21T18:44:23Z

Thanks @Samoed! Score function looks good. Is the ColBERT stuff new here - I am a bit confused by the commit threads.

Samoed · 2024-12-21T18:55:05Z

Yes, this is a new model and was integrated into the main branch, but GitHub is displaying it a bit oddly. I've added you to review the score_function changes in RetrievalEvaluator. If the RetrievalEvaluator changes look good, I’ll proceed with merging into v2.

orionw · 2024-12-21T19:11:05Z

Yes refactor looks good! Only one small nit, model.score in the model classes as a replacement is a bit ambiguous, would prefer something like score_fn or distance_fn or something more verbose.

Otherwise LGTM, feel free to merge.

orionw · 2024-12-21T19:14:47Z

mteb/evaluation/evaluators/model_classes.py

@@ -338,17 +320,20 @@ def _full_corpus_search(
            logging.info("Computing Similarities...")
            query_embeddings = torch.as_tensor(query_embeddings).to(device)
            sub_corpus_embeddings = torch.as_tensor(sub_corpus_embeddings).to(device)
+
+            score_function = (
+                self.model.score if hasattr(self.model, "score") else cos_sim


This was the line that confused me, but I am only on mobile today so perhaps I’m misreading. Feel free to ignore, since I see score_fn later in the code

You are right. This should be similarity function as in sentence_transformers

dbuades and others added 28 commits December 10, 2024 11:32

feat: add new arctic v2.0 models (#1574)

53756ad

* feat: add new arctic v2.0 models * chore: make lint

1.24.0

27f7d8c

Automatically generated by python-semantic-release

fix: Add namaa MrTydi reranking dataset (#1573)

7b9b3c9

* Add dataset class and file requirements * pass tests * make lint changes * adjust meta data and remove load_data --------- Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local>

Update tasks table

1101db7

1.24.1

9c0b208

Automatically generated by python-semantic-release

fix: Eval langs not correctly passed to monolingual tasks (#1587)

373db74

* fix SouthAfricanLangClassification.py * add check for langs * lint

1.24.2

eecc9f1

Automatically generated by python-semantic-release

1.25.0

b466051

Automatically generated by python-semantic-release

Add new models nvidia, gte, linq (#1436)

95d5ae5

* Add new models nvidia, gte, linq * add warning for gte-Qwen and nvidia models re: instruction used in docs as well --------- Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

Leaderboard: Refined plots (#1601)

0c9e046

* Added embedding size guide to performance-size plot, removed shading on radar chart * Changed plot names to something more descriptive * Made plots failsafe

fix: Leaderboard refinements (#1603)

6ecc86f

* Added explanation of aggregate measures * Added download button to result tables * Task info gets sorted by task name * Added custom, shareable links for each benchmark * Moved explanation of aggregate metrics to the summary tab

1.25.1

5e9c468

Automatically generated by python-semantic-release

Feat: Use similarity scores if available (#1602)

b81b584

* Use similarity scores if available * lint

Update tasks table

9de7f20

Add IBM Granite Embedding Models (#1613)

ad05983

* add IBM granite embedding models * lint formatting * add adapted_from and superseded_by to ModelMeta

fix: disable co2_tracker for API models (#1614)

7c8e094

1.25.2

d8c015f

Automatically generated by python-semantic-release

fix: set use_instructions to True in models using prompts (#1616)

0c44482

feat: set `use_instructions` to True in models using prompts

1.25.3

2024338

Automatically generated by python-semantic-release

update RetrievalEvaluator.py

eb29eb3

update imports

107dd4a

update imports and metadata

92dba39

Samoed requested a review from orionw December 21, 2024 10:16

fix tests

7b4ae88

Samoed added 2 commits December 21, 2024 14:55

fix tests

788f54e

fix output path for retrieval

06017ef

Samoed changed the title ~~Update v2~~ [V2] Update v2 Dec 21, 2024

orionw reviewed Dec 21, 2024

View reviewed changes

fix similarity function

7144fca

Samoed merged commit c9b00ac into v2.0.0 Dec 22, 2024
10 checks passed

Samoed deleted the update_v2 branch December 22, 2024 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V2] Update v2 #1618

[V2] Update v2 #1618

Samoed commented Dec 21, 2024 •

edited

Loading

orionw commented Dec 21, 2024

Samoed commented Dec 21, 2024

orionw commented Dec 21, 2024

orionw Dec 21, 2024 •

edited

Loading

Samoed Dec 21, 2024

[V2] Update v2 #1618

[V2] Update v2 #1618

Conversation

Samoed commented Dec 21, 2024 • edited Loading

Checklist

orionw commented Dec 21, 2024

Samoed commented Dec 21, 2024

orionw commented Dec 21, 2024

orionw Dec 21, 2024 • edited Loading

Choose a reason for hiding this comment

Samoed Dec 21, 2024

Choose a reason for hiding this comment

Samoed commented Dec 21, 2024 •

edited

Loading

orionw Dec 21, 2024 •

edited

Loading