Add BEIR-PL datasets to MTEB #121

kwojtasi · 2023-07-18T10:10:02Z

Add tasks and updated README for datasets in BEIR-PL (BEIR benchmark in Polish language).

Muennighoff

Amazing work!

I would standardize the names to all be e.g. DBPediaPL.
Currently, it seems there's HotpotQAPL, but also DBPedia-pl & ArguAna-PL.

Do you want to merge this first & then lateron add the missing datasets (CQA, Touche etc.) in a separate PR?

kwojtasi · 2023-07-26T19:58:42Z

I have updated names to ArguAna-PL, it seems to be the most readable option, but if you want other names standardization I can change it.

We can merge it and I will add another PR for all CQADupstack and Touche datasets.

Muennighoff

Looks good to me!
Can you run all tasks for at least 1 model and provide the result files here? I will then update the leaderboard to have a tab for BEIR-PL

kwojtasi · 2023-08-02T13:11:53Z

Hi, I have evaluated "distiluse-base-multilingual-cased-v2" from SentenceTransformers. Attaching results.
results.zip

Also did some minor changes that were required to run the evaluation.

Muennighoff · 2023-08-02T14:03:57Z

mteb/abstasks/__init__.py

 from .BeIRTask import *
 from .CrosslingualTask import *
 from .MultilingualTask import *
+from .BeIRPLTask import *


Suggested change

from .BeIRTask import *

from .CrosslingualTask import *

from .MultilingualTask import *

from .BeIRPLTask import *

from .BeIRPLTask import *

from .BeIRTask import *

from .CrosslingualTask import *

from .MultilingualTask import *

Muennighoff

Looks great, amazing work!
I've added the results in a new leaderboard tab for Polish under Retrieval: https://huggingface.co/spaces/mteb/leaderboard

Feel free to add more models by either sending the result files or adding the results to the model card of the models.

Will merge this if fine with you!

Konrad Wojtasik added 2 commits July 18, 2023 11:29

Add BIER-PL benchmark

c4f6654

Update README with BEIR-PL datasets

2c39a13

Muennighoff reviewed Jul 18, 2023

View reviewed changes

Muennighoff mentioned this pull request Jul 21, 2023

Adding a new retrieval task #122

Closed

Update names

50a9f67

Muennighoff approved these changes Jul 27, 2023

View reviewed changes

Add tasks to init to be visible during evaluation

8cba37a

Muennighoff reviewed Aug 4, 2023

View reviewed changes

Muennighoff approved these changes Aug 4, 2023

View reviewed changes

Merge branch 'main' into main

2d5fb86

Muennighoff mentioned this pull request Aug 18, 2023

Add Polish tasks (PL-MTEB) #137

Merged

Muennighoff merged commit 5972c02 into embeddings-benchmark:main Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BEIR-PL datasets to MTEB #121

Add BEIR-PL datasets to MTEB #121

kwojtasi commented Jul 18, 2023

Muennighoff left a comment

kwojtasi commented Jul 26, 2023

Muennighoff left a comment

kwojtasi commented Aug 2, 2023

Muennighoff Aug 2, 2023

Muennighoff left a comment •

edited

Loading

Add BEIR-PL datasets to MTEB #121

Add BEIR-PL datasets to MTEB #121

Conversation

kwojtasi commented Jul 18, 2023

Muennighoff left a comment

Choose a reason for hiding this comment

kwojtasi commented Jul 26, 2023

Muennighoff left a comment

Choose a reason for hiding this comment

kwojtasi commented Aug 2, 2023

Muennighoff Aug 2, 2023

Choose a reason for hiding this comment

Muennighoff left a comment • edited Loading

Choose a reason for hiding this comment

Muennighoff left a comment •

edited

Loading