Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chinese tasks (C-MTEB) #134

Merged
merged 8 commits into from
Aug 26, 2023
Merged

Conversation

staoxiao
Copy link
Contributor

@staoxiao staoxiao commented Aug 8, 2023

No description provided.

Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, amazing job!

Sent you a few more comments via mail~

scripts/run_mteb_chinese.py Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Member

@NouamaneTazi NouamaneTazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean PR, thanks a lot! 🙌
Left some small comments before merging this!

mteb/tasks/Retrieval/CMTEBRetrieval.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@Muennighoff
Copy link
Contributor

I think everything is resolved - merging this! Feel free to still make changes later 😇

Copy link
Member

@NouamaneTazi NouamaneTazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks 🚀

@NouamaneTazi NouamaneTazi merged commit 071974a into embeddings-benchmark:main Aug 26, 2023
KennethEnevoldsen added a commit that referenced this pull request Apr 11, 2024
This includes 29 datasets (38 points) and 6x2 bonus points (12 points) for the 6 taskXlanguage which was not previously included.

All the points are attributed to @staoxiao, though we can split them if needed.

We also added points for review.
KennethEnevoldsen added a commit that referenced this pull request Apr 11, 2024
* docs: Added missing points for #214

Added 6x2 points for guenthermi for datasets and 1 point to  Muennighoff for review

I have not accounted for bonus points as I am not sure was what available at the time.

* docs: added point for #197

Added 2 points for rasdani and 2 bonus points for the first german retrieval (I believe). Added one point for each of the reviewers

* docs: added points for #116

This includes 6 points for 3 datasets to slvnwhrl +2 for first german clustering task also added points for reviews

* Added points for #134 cmteb

This includes 29 datasets (38 points) and 6x2 bonus points (12 points) for the 6 taskXlanguage which was not previously included.

All the points are attributed to @staoxiao, though we can split them if needed.

We also added points for review.

* docs: Added points for #137 polish

This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review

* docs: Added points for #27 (spanish)

These include 9 datasets (18 points) across 4 news tasks (8) for spanish.

Points are given to violenil as the contributor, and one points for reviewers. Points can be split up if needed.

* docs: Added points for #224

Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review.

* docs: Added points for #210 (korean)

This include 3 datasets (6 points) across 1 new task (+2 bonus) for korean. Also added 1 points for reviewers.

* Add contributor

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants