Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: first stab at point attribution #280

Closed
wants to merge 50 commits into from

Conversation

MartinBernstorff
Copy link
Contributor

@MartinBernstorff MartinBernstorff commented Mar 25, 2024

Kenneth and I took a first stab at attributing points for the contributions so far.

These are meant as an opening for a discussion, not at all a final list, so definitely feel free to suggest changes!

@KennethEnevoldsen
4 add dataset annotations for size
1 added dataset annotations one dataset
2 added ci
3 updated readme x 3 (installation instruction, adding dataset, mmteb etc.)
1 folder structure
7 reviewed pr x 7
= 18

@MartinBernstorff
Merged PRs:

Reviews:

@MartinBernstorff MartinBernstorff changed the title docs: expand upon points for PR reviews docs: first stab at point attribution Mar 25, 2024
@KennethEnevoldsen KennethEnevoldsen requested review from imenelydiaker and Muennighoff and removed request for KennethEnevoldsen March 25, 2024 08:16
@Muennighoff
Copy link
Contributor

Shouldn't this PR also update the actual scores in points.md?

@KennethEnevoldsen
Copy link
Contributor

@Muennighoff we wanted to simply discuss it before we added it in

@Myahr208
Copy link

Myahr208 commented Mar 26, 2024 via email

@KennethEnevoldsen
Copy link
Contributor

@Myahr208 it seems like there is something wrong with your formatting

@Muennighoff
Copy link
Contributor

@Muennighoff we wanted to simply discuss it before we added it in

Sure it looks good to me!

@KennethEnevoldsen
Copy link
Contributor

@imenelydiaker I would love you thoughts on this PR?

@imenelydiaker
Copy link
Contributor

imenelydiaker commented Apr 3, 2024

@imenelydiaker I would love you thoughts on this PR?

@KennethEnevoldsen I approved the PR, everything looks good to me! 🚀 You guys did a great job!

@imenelydiaker
Copy link
Contributor

@KennethEnevoldsen and @MartinBernstorff PR #302 was merged, can you please add your affiliations and merge this PR?

KennethEnevoldsen and others added 27 commits April 10, 2024 10:18
* docs: added information related to the automatic release

* docs: removed test-parallel from docs

* docs: minor additions to contributing guidelines

* ci: removed changelog

As it already present in the git releases

* Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
…-terrier (#292)

* ci: Added windows to test suite

* feat: Changed to pytrec-eval-terrier to add support for windows installs
Automatically generated by python-semantic-release
* fix: Fixed hf_hub_name for WikiCitiesClustering

* Added points for this PR and a 3 other minor dataset fixes
Automatically generated by python-semantic-release
…eriting AbsTask (#299)

* Allow extending the load_dataset parameters

* format

* Fix test

* remove duplicated logic from AbsTask, now handled in the metadata

* add tests

* remove comments, moved to PR

* format

* extend metadata dict from super class

* Remove additional load_data

* test: adding very high level test

* Remove hf_hub_name and add test

* Fix revision in output file

---------

Co-authored-by: gbmarc1 <marcantoine.belanger@shopify.com>
Automatically generated by python-semantic-release
* fix: Fixed hf_hub_name for WikiCitiesClustering

* Added points for this PR and a 3 other minor dataset fixes

* feat: Added tests which validated that datasets are available

* fix: Updated hf references and revisions to multiple datasets

* Added points for submission

* fix: Added suggestions from the review

* Apply suggestions from code review

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

* fix: sped up async test for whether datasets exist

* fix: Updated revisions

* fix: reuploaded scandeval datasets

* fix: Applied formatter

---------

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Automatically generated by python-semantic-release
* Update points.md

* Update docs/mmteb/points.md

* Update points.md

* Update points.md
* Update MindSmallReranking.py

* fix: Updated wrong metadata
Automatically generated by python-semantic-release
* docs: added points for seb

* docs: added points for seb
* add command

* add datasets

* reformat dataset

* Rephrase description

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/__init__.py

* Update scripts/run_mteb_law.py

* Update scripts/run_mteb_law.py

* Update mteb/__init__.py

* Update mteb/tasks/Retrieval/__init__.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalQuADRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalQuADRetrieval.py

* Update scripts/run_mteb_law.py

* Update mteb/tasks/Retrieval/law/LegalSummarizationRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalSummarizationRetrieval.py

* Update mteb/tasks/Retrieval/law/LeCaRDv2Retrieval.py

* Update mteb/tasks/Retrieval/law/LeCaRDv2Retrieval.py

* Rename GerDaLIRRetrieval.py to GerDaLIRSmallRetrieval.py

* Update mteb/tasks/Retrieval/__init__.py

* Update GerDaLIRSmallRetrieval.py

Add metadata

* Update GerDaLIRSmallRetrieval.py

Update metadata

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval metadata

* Update AILAStatutesRetrieval.py

Update AILAStatutesRetrieval metadata

* Update LeCaRDv2Retrieval.py

Update LeCaRDv2Retrieval metadata

* Update LegalBenchConsumerContractsQARetrieval.py

Update LegalBenchConsumerContractsQARetrieval metadata

* Update LegalBenchCorporateLobbyingRetrieval.py

Update LegalBenchCorporateLobbyingRetrieval metadata

* Update LegalQuADRetrieval.py

Update LegalQuADRetrieval metadata

* Update LegalSummarizationRetrieval.py

Update LegalSummarizationRetrieval metadata

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval metadata

* Update AILAStatutesRetrieval.py

Update AILAStatutesRetrieval metadata

* Update GerDaLIRSmallRetrieval.py

Update GerDaLIRSmallRetrieval metadata

* Update LeCaRDv2Retrieval.py

Update LeCaRDv2Retrieval metadata

* Update LegalBenchConsumerContractsQARetrieval.py

* Update LegalBenchCorporateLobbyingRetrieval.py

* Update LegalQuADRetrieval.py

* Update LegalSummarizationRetrieval.py

* Update AILACasedocsRetrieval.py

* Update AILAStatutesRetrieval.py

* Update GerDaLIRSmallRetrieval.py

* Update LeCaRDv2Retrieval.py

* move dataset language folder

* update order

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* Fix typos in readme.md

* Added news classification dataset.

* Added news classification dataset.

* Fixes on suggestions

* Update docs/mmteb/points.md

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Automatically generated by python-semantic-release
* fix: remove time of run (as it does not relate to the model itself). Time of run should be on the dataset results

* fix: fixes the PawsX datasets

* docs: Updated points

* fix: flores clustering

* fix: mulitple dataset fixes

* docs: updated points

* fix: added missing dataset_transform to multitask task

* syle: ran formatter

* fix: correctly fix pawsX
Automatically generated by python-semantic-release
Automatically generated by python-semantic-release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants