Add Polish tasks (PL-MTEB) #137

rafalposwiata · 2023-08-18T13:30:27Z

Add Polish tasks (PL-MTEB).
Files structure was inspired by Chinese C-MTEB.
Evaluation results for the three models have already been added to https://huggingface.co/datasets/mteb/results

Muennighoff · 2023-08-18T18:23:10Z

This looks amazing, great job! 🚀

Do you know of any Polish Reranking tasks? I think we should be able to find something for MMarcoReranking at least (which is also there for C-MTEB), as it's available in multiple languages I think?
Would it be okay for you if I add a new Polish tab in the Overall tab of the leaderboard that aggregates across these tasks added by you as well as BEIR-PL for Retrieval similar to the Chinese tab?

rafalposwiata · 2023-08-18T22:28:15Z

Do you know of any Polish Reranking tasks? I think we should be able to find something for MMarcoReranking at least (which is also there for C-MTEB), as it's available in multiple languages I think?

I searched but couldn't find any Polish Reranking tasks. Unfortunately, mMarco doesn't include Polish.

Would it be okay for you if I add a new Polish tab in the Overall tab of the leaderboard that aggregates across these tasks added by you as well as Add BEIR-PL datasets to MTEB #121 for Retrieval similar to the Chinese tab?

Of course, I will be grateful :)

Muennighoff

Looks good to me! I've added the tab to the leaderboard: https://huggingface.co/spaces/mteb/leaderboard
Lmk if you'd change sth!

Two more things before merging

Can you add each dataset to the README.md table? You can consult this script if you want to: https://github.com/embeddings-benchmark/mteb/blob/main/scripts/data/create_task_table.py
Can you maybe run a few more models? I think running multilingual-e5-large would be great
Can you run BEIR-PL Retrieval for the models too, so we have the average score? You will have to checkout to this branch

Once done, will merge this & BEIR-PL. Really amazing work!!!

Muennighoff · 2023-08-26T18:13:06Z

Nice the code is good now - Merging this! Let me know if you manage to run the additional models/evals!

mteb/tasks/STS/PolishSTS.py

This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review

@staoxiao

* docs: Added missing points for #214 Added 6x2 points for guenthermi for datasets and 1 point to Muennighoff for review I have not accounted for bonus points as I am not sure was what available at the time. * docs: added point for #197 Added 2 points for rasdani and 2 bonus points for the first german retrieval (I believe). Added one point for each of the reviewers * docs: added points for #116 This includes 6 points for 3 datasets to slvnwhrl +2 for first german clustering task also added points for reviews * Added points for #134 cmteb This includes 29 datasets (38 points) and 6x2 bonus points (12 points) for the 6 taskXlanguage which was not previously included. All the points are attributed to @staoxiao, though we can split them if needed. We also added points for review. * docs: Added points for #137 polish This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review * docs: Added points for #27 (spanish) These include 9 datasets (18 points) across 4 news tasks (8) for spanish. Points are given to violenil as the contributor, and one points for reviewers. Points can be split up if needed. * docs: Added points for #224 Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review. * docs: Added points for #210 (korean) This include 3 datasets (6 points) across 1 new task (+2 bonus) for korean. Also added 1 points for reviewers. * Add contributor --------- Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Add Polish tasks (PL-MTEB)

24cd226

Muennighoff reviewed Aug 19, 2023

View reviewed changes

Add Polish datasets to README

e2e7d83

Muennighoff approved these changes Aug 26, 2023

View reviewed changes

Muennighoff reviewed Aug 26, 2023

View reviewed changes

mteb/tasks/STS/PolishSTS.py Outdated Show resolved Hide resolved

Add newline

a1e62d1

Muennighoff merged commit 2779344 into embeddings-benchmark:main Aug 26, 2023

Muennighoff mentioned this pull request Apr 4, 2024

Adding French team contribution points #302

Merged

KennethEnevoldsen added a commit that referenced this pull request Apr 11, 2024

docs: Added points for #137 polish

256e38e

This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Polish tasks (PL-MTEB) #137

Add Polish tasks (PL-MTEB) #137

rafalposwiata commented Aug 18, 2023

Muennighoff commented Aug 18, 2023

rafalposwiata commented Aug 18, 2023

Muennighoff left a comment •

edited

Loading

Muennighoff commented Aug 26, 2023

Add Polish tasks (PL-MTEB) #137

Add Polish tasks (PL-MTEB) #137

Conversation

rafalposwiata commented Aug 18, 2023

Muennighoff commented Aug 18, 2023

rafalposwiata commented Aug 18, 2023

Muennighoff left a comment • edited Loading

Choose a reason for hiding this comment

Muennighoff commented Aug 26, 2023

Muennighoff left a comment •

edited

Loading