-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MultiLongDocRetrieval task to MTEB. #224
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Muennighoff
approved these changes
Feb 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Congrats on the cool paper
Thanks! |
KennethEnevoldsen
added a commit
that referenced
this pull request
Apr 11, 2024
Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review.
KennethEnevoldsen
added a commit
that referenced
this pull request
Apr 11, 2024
* docs: Added missing points for #214 Added 6x2 points for guenthermi for datasets and 1 point to Muennighoff for review I have not accounted for bonus points as I am not sure was what available at the time. * docs: added point for #197 Added 2 points for rasdani and 2 bonus points for the first german retrieval (I believe). Added one point for each of the reviewers * docs: added points for #116 This includes 6 points for 3 datasets to slvnwhrl +2 for first german clustering task also added points for reviews * Added points for #134 cmteb This includes 29 datasets (38 points) and 6x2 bonus points (12 points) for the 6 taskXlanguage which was not previously included. All the points are attributed to @staoxiao, though we can split them if needed. We also added points for review. * docs: Added points for #137 polish This includes points for 12 datasets (24) across 4 tasks (8). These points are given to rafalposwiata and then one point for review * docs: Added points for #27 (spanish) These include 9 datasets (18 points) across 4 news tasks (8) for spanish. Points are given to violenil as the contributor, and one points for reviewers. Points can be split up if needed. * docs: Added points for #224 Added points 2 points for the dataset. I could imagine that I might have missed some bonus points as well. Also added one point for review. * docs: Added points for #210 (korean) This include 3 datasets (6 points) across 1 new task (+2 bonus) for korean. Also added 1 points for reviewers. * Add contributor --------- Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We introduce a new multilingual long-document retrieval task MultiLongDocRetrieval in this paper. We hope to add it to MTEB.
The previous code in
AbsTaskRetrieval.py
does not support multilingual retrieval. Therefore, we implement this by referring to the code inAbsTaskClassification.py
.We have already tested the modified code, and it is working fine.