Add MultiLongDocRetrieval task to MTEB. #224

hanhainebula · 2024-02-06T06:56:34Z

We introduce a new multilingual long-document retrieval task MultiLongDocRetrieval in this paper. We hope to add it to MTEB.

The previous code in AbsTaskRetrieval.py does not support multilingual retrieval. Therefore, we implement this by referring to the code in AbsTaskClassification.py.

We have already tested the modified code, and it is working fine.

Muennighoff

Amazing! Congrats on the cool paper

hanhainebula · 2024-02-06T08:10:42Z

Thanks!

Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review.

@staoxiao

* docs: Added missing points for #214 Added 6x2 points for guenthermi for datasets and 1 point to Muennighoff for review I have not accounted for bonus points as I am not sure was what available at the time. * docs: added point for #197 Added 2 points for rasdani and 2 bonus points for the first german retrieval (I believe). Added one point for each of the reviewers * docs: added points for #116 This includes 6 points for 3 datasets to slvnwhrl +2 for first german clustering task also added points for reviews * Added points for #134 cmteb This includes 29 datasets (38 points) and 6x2 bonus points (12 points) for the 6 taskXlanguage which was not previously included. All the points are attributed to @staoxiao, though we can split them if needed. We also added points for review. * docs: Added points for #137 polish This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review * docs: Added points for #27 (spanish) These include 9 datasets (18 points) across 4 news tasks (8) for spanish. Points are given to violenil as the contributor, and one points for reviewers. Points can be split up if needed. * docs: Added points for #224 Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review. * docs: Added points for #210 (korean) This include 3 datasets (6 points) across 1 new task (+2 bonus) for korean. Also added 1 points for reviewers. * Add contributor --------- Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

hanhainebula added 6 commits February 5, 2024 14:52

Update AbsTaskRetrieval.py.

8c3ab3e

Add Retrieval Task: MultiLongDocRetrieval

5c8b8f4

Merge branch 'embeddings-benchmark:main' into main

a2348fa

Update AbsTaskRetrieval.py and MLDR task

7271ce4

Merge branch 'main' of github.com:hanhainebula/mteb

ba0299c

Update reference of MLDR

4e4c534

Muennighoff approved these changes Feb 6, 2024

View reviewed changes

Muennighoff merged commit 2f65179 into embeddings-benchmark:main Feb 6, 2024
3 checks passed

Muennighoff mentioned this pull request Apr 4, 2024

Adding French team contribution points #302

Merged

KennethEnevoldsen added a commit that referenced this pull request Apr 11, 2024

docs: Added points for #224

b48fd79

Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiLongDocRetrieval task to MTEB. #224

Add MultiLongDocRetrieval task to MTEB. #224

hanhainebula commented Feb 6, 2024

Muennighoff left a comment

hanhainebula commented Feb 6, 2024

Add MultiLongDocRetrieval task to MTEB. #224

Add MultiLongDocRetrieval task to MTEB. #224

Conversation

hanhainebula commented Feb 6, 2024

Muennighoff left a comment

Choose a reason for hiding this comment

hanhainebula commented Feb 6, 2024