Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add metadata basemodel #260

Conversation

MartinBernstorff
Copy link
Contributor

@MartinBernstorff MartinBernstorff commented Mar 19, 2024

For initial feedback. Note that the first commit is a large refactor, so recommend excluding it from the diff when reviewing, e.g. only looking at:

df7672d

If we are agreed on the schema, we can merge, and then I'll then start updating the 100+ existing tasks. What do you think @KennethEnevoldsen?

Todo:

  • Check that all files with _LANGS and _EVAL_SPLIT use them in their TaskMetadata

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few recommendations otherwise I believe it looks good.

Can you also bump the version number. An example where you transform DDisco would also be great (then the other reviewers can see how it would look in practice).

mteb/abstasks/TaskMetadata.py Outdated Show resolved Hide resolved
mteb/abstasks/TaskMetadata.py Outdated Show resolved Hide resolved
mteb/abstasks/TaskMetadata.py Outdated Show resolved Hide resolved
mteb/abstasks/TaskMetadata.py Show resolved Hide resolved
@MartinBernstorff
Copy link
Contributor Author

MartinBernstorff commented Mar 19, 2024

Can you also bump the version number.

Ah, good point. Should we perhaps set up semantic versioning?

An example where you transform DDisco would also be great (then the other reviewers can see how it would look in practice).

Should already be in the commit history?

df7672d#diff-35934ee4d3e5d7d6659d8cc6c67991c680adf28f1d1d78ab9701f6978e0dd90b

Good catches on the missing Nones!

@MartinBernstorff MartinBernstorff marked this pull request as ready for review March 19, 2024 12:16
@KennethEnevoldsen
Copy link
Contributor

Ah, good point. Should we perhaps set up semantic versioning?

#261

Should already be in the commit history?

Ah thanks must have missed it.

@MartinBernstorff
Copy link
Contributor Author

Questions before merge:

  • The amazon polarity dataset is hosted by MTEB. I have added https://huggingface.co/datasets/amazon_polarity as reference, is that correct?

@KennethEnevoldsen
Copy link
Contributor

The amazon polarity dataset is hosted by MTEB. I have added https://huggingface.co/datasets/amazon_polarity as reference, is that correct?

I believe so @Muennighoff will have to confirm though.

KennethEnevoldsen and others added 2 commits March 20, 2024 14:23
…er structure. (embeddings-benchmark#254)

* Added model results to repo and updated CLI to create consistent folder structure.

* ci: updated ci to use make install

* Added missing pytest dependencies

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

---------

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* restructing the readme

* removed double specification of versions and moved all setup to pyproject.toml

* correctly use flat-layout for the package
@MartinBernstorff
Copy link
Contributor Author

Should be ready to merge @KennethEnevoldsen 👍

@KennethEnevoldsen
Copy link
Contributor

The tests seem to fail @MartinBernstorff it seems to be due to a missing pydantic dependency.

@MartinBernstorff
Copy link
Contributor Author

@KennethEnevoldsen Seems I haven't been added as a collaborator, so I can't merge this. Ready to merge now 👍

@MartinBernstorff MartinBernstorff merged commit dd5d617 into embeddings-benchmark:main Mar 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants