-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Scandinavian Languages #124
Added support for Scandinavian Languages #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
- I like the
dataset_transform
. I think we can leave it like you did for now & maybe we'll make it a generic function later - We still need to import all tasks in the
__init__.py
files for each task directory I think - Why did you choose 16
samples_per_label
for all Classification tasks? @NouamaneTazi @loicmagne do you remember how we selected the differentsamples_per_label
values for CLF tasks? Was it based on how big the dataset is / the number of labels? - Can you run a few models on the new tasks and share the result files here? I can add a new leaderboard tab for some of these languages where we have a few datasets. Maybe three new CLF tabs for Danish, Norwegian, Swedish? Or is it more useful if it's just one new CLF tab for "Scandinavia"?
- Do we have results on SweFAQRetrieval from prior work & are we able to reproduce it?
|
|
Perfect. I have sent the results by mail. I have also removed the task and fixed #126 (which was a good thing as it revealed a fix errors). Assuming you think so as well I think this is all good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; Have added all tasks to the leaderboard: https://huggingface.co/spaces/mteb/leaderboard
Let me know if the leaderboard looks okay to you? & then will merge 👍
The only change I would make is to change the name of Bitext from other to Danish (it is Danish + a Danish dialect). Otherwise, I think it looks good! Edit: Actually if you wish I am creating an aggregated site for the Scandinavian subsection here (still working on it). Feel free to link to it. Plan to also add Finnish, Icelandic and Faroese as well in the future (as well as adding them to MTEB). Edit: Oh it seems like ScalaNbClassification is in Swedish instead of Norwegian |
Fixed & added the link! If you want to link it in a different way, let me know - You can also edit the Also FYI all your scores are in this repository: https://huggingface.co/datasets/mteb/results Merging now 🚀 |
Addition
Task selection:
Potential problems
Other
Fixes #126