New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: first stab at point attribution #280

Closed

MartinBernstorff wants to merge 50 commits into main from MartinBernstorff-patch-1

Contributor

MartinBernstorff commented Mar 25, 2024 •

edited

Loading

Kenneth and I took a first stab at attributing points for the contributions so far.

These are meant as an opening for a discussion, not at all a final list, so definitely feel free to suggest changes!

@KennethEnevoldsen
4 add dataset annotations for size
1 added dataset annotations one dataset
2 added ci
3 updated readme x 3 (installation instruction, adding dataset, mmteb etc.)
1 folder structure
7 reviewed pr x 7
= 18

@MartinBernstorff
Merged PRs:

Danish Discourse Dataset (Add Danish Discourse dataset #247): 2
Add dataset schemas to docs (docs: add dataset schemas #255): 2
Add metadata basemodel (refactor: add metadata basemodel #260): 4
Fix test collection (tests: do not run tests on collection #249): 1

Reviews:

MMTEB addition (Added MMTEB #275): 2
Sizes to metadata (fix: Added sizes to the metadata #276): 1
Total: 12


          Update readme.md

e7e8f4a

MartinBernstorff requested a review from KennethEnevoldsen

March 25, 2024 08:08

MartinBernstorff changed the title ~~docs: expand upon points for PR reviews~~ docs: first stab at point attribution

KennethEnevoldsen requested review from imenelydiaker and Muennighoff and removed request for KennethEnevoldsen

March 25, 2024 08:16

Contributor

Muennighoff commented Mar 25, 2024

Shouldn't this PR also update the actual scores in points.md?

Contributor

KennethEnevoldsen commented Mar 26, 2024

@Muennighoff we wanted to simply discuss it before we added it in

Myahr208 commented Mar 26, 2024 via email •

edited by KennethEnevoldsen

Loading

edited by KennethEnevoldsen (it was just a some compressed message)

Contributor

KennethEnevoldsen commented Mar 26, 2024

@Myahr208 it seems like there is something wrong with your formatting

Contributor

Muennighoff commented Mar 26, 2024

@Muennighoff we wanted to simply discuss it before we added it in

Sure it looks good to me!

imenelydiaker approved these changes

View reviewed changes

Contributor

KennethEnevoldsen commented Apr 3, 2024

@imenelydiaker I would love you thoughts on this PR?

Contributor

imenelydiaker commented Apr 3, 2024 •

edited

Loading

@imenelydiaker I would love you thoughts on this PR?

@KennethEnevoldsen I approved the PR, everything looks good to me! 🚀 You guys did a great job!

Contributor

imenelydiaker commented Apr 4, 2024

@KennethEnevoldsen and @MartinBernstorff PR #302 was merged, can you please add your affiliations and merge this PR?

MartinBernstorff and others added 15 commits

April 10, 2024 10:18


          fix: dead link in readme

eba127a


          update TaskMetadata.py (#281)

4d4d947


          ci: renamed test job and workflow (#282)

1ae9d74

ci: Added tests


          tests: speed up tests (#283)

1287ea5

update Makefile and test_all_abstasks.py


          added release pipeline

42c9e36


          v1.3.0

19986a7


          ci: moved release to the correct folder

edeb1ba


          0.10.0

1d5798a

Automatically generated by python-semantic-release


          v1.3.0

a593134


          v1.3.0

09fafed


          feat: Updating version

0aa41fc

BREAKING CHANGE: Bump version


          feat: bump version again

d43dd70


          feat: bump version again

c591fc5


          ci: disable changelog

a954121


          overwrite version

KennethEnevoldsen and others added 27 commits

April 10, 2024 10:18


          fix: fixed bug introduced in TatoebaBitextMining causing it to use a …

d887d5d

…different dataset (#297)


          docs: Added information related to the automatic release (#290)

3571ca4

* docs: added information related to the automatic release

* docs: removed test-parallel from docs

* docs: minor additions to contributing guidelines

* ci: removed changelog

As it already present in the git releases

* Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>


          1.3.3

9ac8484

Automatically generated by python-semantic-release


          fix: Update MindSmallReranking.py to have the correct hf reference (#303

2c10f73


          1.3.4

bc18fb7

Automatically generated by python-semantic-release


          feat: Added windows support by replacing pytrec-eval with pytrec-eval…

758a946

…-terrier (#292)

* ci: Added windows to test suite

* feat: Changed to pytrec-eval-terrier to add support for windows installs


          1.4.0

4cb2065

Automatically generated by python-semantic-release


          fix: hf_hub_name for WikiCitiesClustering (#305)

7e4d5df

* fix: Fixed hf_hub_name for WikiCitiesClustering

* Added points for this PR and a 3 other minor dataset fixes


          1.4.1

551b370

Automatically generated by python-semantic-release


          feat: Allow extending the load_dataset parameters in custom tasks inh…

e76525f

…eriting AbsTask (#299)

* Allow extending the load_dataset parameters

* format

* Fix test

* remove duplicated logic from AbsTask, now handled in the metadata

* add tests

* remove comments, moved to PR

* format

* extend metadata dict from super class

* Remove additional load_data

* test: adding very high level test

* Remove hf_hub_name and add test

* Fix revision in output file

---------

Co-authored-by: gbmarc1 <marcantoine.belanger@shopify.com>


          1.5.0

161949a

Automatically generated by python-semantic-release


          fix: Added tests for checking datasets (#307)

c36ca17

* fix: Fixed hf_hub_name for WikiCitiesClustering

* Added points for this PR and a 3 other minor dataset fixes

* feat: Added tests which validated that datasets are available

* fix: Updated hf references and revisions to multiple datasets

* Added points for submission

* fix: Added suggestions from the review

* Apply suggestions from code review

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

* fix: sped up async test for whether datasets exist

* fix: Updated revisions

* fix: reuploaded scandeval datasets

* fix: Applied formatter

---------

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>


          1.5.1

1580b97

Automatically generated by python-semantic-release


          Adding French team contribution points (#302)

5305b58

* Update points.md

* Update docs/mmteb/points.md

* Update points.md

* Update points.md


          fix: Minor fixes to metadata (#315)

bf6e290

* Update MindSmallReranking.py

* fix: Updated wrong metadata


          1.5.2

f160321

Automatically generated by python-semantic-release


          Fix PawsX eval splits (#316)

f180cc8


          docs: Small fixes in readme.md (#317)

f27fe19

Fix typos in readme.md


          docs: Added point for SEB (#318)

0f19e35

* docs: added points for seb

* docs: added points for seb


          Add law datasets (#311)

62736ba

* add command

* add datasets

* reformat dataset

* Rephrase description

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/__init__.py

* Update scripts/run_mteb_law.py

* Update scripts/run_mteb_law.py

* Update mteb/__init__.py

* Update mteb/tasks/Retrieval/__init__.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/GerDaLIRRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalQuADRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalQuADRetrieval.py

* Update scripts/run_mteb_law.py

* Update mteb/tasks/Retrieval/law/LegalSummarizationRetrieval.py

* Update mteb/tasks/Retrieval/law/LegalSummarizationRetrieval.py

* Update mteb/tasks/Retrieval/law/LeCaRDv2Retrieval.py

* Update mteb/tasks/Retrieval/law/LeCaRDv2Retrieval.py

* Rename GerDaLIRRetrieval.py to GerDaLIRSmallRetrieval.py

* Update mteb/tasks/Retrieval/__init__.py

* Update GerDaLIRSmallRetrieval.py

Add metadata

* Update GerDaLIRSmallRetrieval.py

Update metadata

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval metadata

* Update AILAStatutesRetrieval.py

Update AILAStatutesRetrieval metadata

* Update LeCaRDv2Retrieval.py

Update LeCaRDv2Retrieval metadata

* Update LegalBenchConsumerContractsQARetrieval.py

Update LegalBenchConsumerContractsQARetrieval metadata

* Update LegalBenchCorporateLobbyingRetrieval.py

Update LegalBenchCorporateLobbyingRetrieval metadata

* Update LegalQuADRetrieval.py

Update LegalQuADRetrieval metadata

* Update LegalSummarizationRetrieval.py

Update LegalSummarizationRetrieval metadata

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval

* Update AILACasedocsRetrieval.py

Update AILACasedocsRetrieval metadata

* Update AILAStatutesRetrieval.py

Update AILAStatutesRetrieval metadata

* Update GerDaLIRSmallRetrieval.py

Update GerDaLIRSmallRetrieval metadata

* Update LeCaRDv2Retrieval.py

Update LeCaRDv2Retrieval metadata

* Update LegalBenchConsumerContractsQARetrieval.py

* Update LegalBenchCorporateLobbyingRetrieval.py

* Update LegalQuADRetrieval.py

* Update LegalSummarizationRetrieval.py

* Update AILACasedocsRetrieval.py

* Update AILAStatutesRetrieval.py

* Update GerDaLIRSmallRetrieval.py

* Update LeCaRDv2Retrieval.py

* move dataset language folder

* update order

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>


          Fix name

875f240


          fix: Added English news classification dataset (#323)

d965c1c

* Fix typos in readme.md

* Added news classification dataset.

* Added news classification dataset.

* Fixes on suggestions

* Update docs/mmteb/points.md

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>


          1.5.3

a9bd603

Automatically generated by python-semantic-release


          fix: Multiple dataset fixes (#328)

e17036c

* fix: remove time of run (as it does not relate to the model itself). Time of run should be on the dataset results

* fix: fixes the PawsX datasets

* docs: Updated points

* fix: flores clustering

* fix: mulitple dataset fixes

* docs: updated points

* fix: added missing dataset_transform to multitask task

* syle: ran formatter

* fix: correctly fix pawsX


          1.5.4

add954f

Automatically generated by python-semantic-release


          fix: Improve logging when the revision is None (#329)

89339e5


          1.5.5

478b4e3

Automatically generated by python-semantic-release

MartinBernstorff closed this

MartinBernstorff mentioned this pull request

docs: add points and affiliation for MartinBernstorff #335

Merged

KennethEnevoldsen deleted the MartinBernstorff-patch-1 branch

July 19, 2024 08:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet