-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added model results to repo and updated CLI to create consistent folder structure. #254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - is the intention to have one result file for every dataset? Could be a good idea, so it's easy to get an idea of what kind of performance to expect
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Yep exactly. This also makes it easier for us to review dataset submissions as they then also include at least 1-3 models that have run on the data. We want it to be both:
|
The assumption is to have the results at least for newly submitted datasets. |
Makes sense, should this be specified somewhere e.g. in how to contribute? Also maybe it makes sense to explicitly tell people always to run model X so it's a bit easier to compare if it's always the same model? If it's just 1 model, it should probably be a multilingual one 🤔 |
Yep I plan to make a PR with how to add datasets that include some standard models (small and multilingual). I was thinking e5-multilingual-small and the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. |
Great idea! Both models are multilingual so yes why not! Just keep in mind that for some languages they may not perform very well.. [under-represented languages in training datasets] |
@@ -0,0 +1 @@ | |||
{"model_name": "sentence-transformers/all-MiniLM-L6-v2", "time_of_run": "2024-03-18 11:22:22.739054", "versions": {"sentence_transformers": "2.0.0", "transformers": "4.6.1", "pytorch": "1.8.1"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a revision number to make sure it's the same model version that is used? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually love to, but couldn't figure out how to do it. I don't believe it is recorded in the model object. You can naturally fetch the latest from the repo, but then hitting the cache causes discrepancies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about doing the same as for datasets in mteb (revision_id
). Just specifying the commit id from the HF repo that stores the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I would love to do that. However, I am not sure the commit id is available in the model object (you would have to know it beforehand). I would love to add that (but seems like that is outside the scope of this PR)
@imenelydiaker do anyone from your team have the time to give it a go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see what you mean. I can check this and open another PR.
I suppose this closes #248 - would add it to the PR description, but can't edit 👍 |
@MartinBernstorff this is #254 |
Ah yeah, have updated my comment 👍 |
…er structure. (embeddings-benchmark#254) * Added model results to repo and updated CLI to create consistent folder structure. * ci: updated ci to use make install * Added missing pytest dependencies * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> --------- Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* refactor: rename description to metadata dict * refactor: add TaskMetadata and first example * update 9 files * update TaskMetadata.py * update TaskMetadata.py * update TaskMetadata.py * update LICENSE, TaskMetadata.py and requirements.dev.txt * update 151 files * update 150 files * update 43 files and delete 1 file * update 106 files * update 45 files * update 6 files * update 14 files * Added model results to repo and updated CLI to create consistent folder structure. (#254) * Added model results to repo and updated CLI to create consistent folder structure. * ci: updated ci to use make install * Added missing pytest dependencies * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> --------- Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Restructing the readme (#262) * restructing the readme * removed double specification of versions and moved all setup to pyproject.toml * correctly use flat-layout for the package * build(deps): update TaskMetadata.py and pyproject.toml * update 221 files * build(deps): update pyproject.toml * build(deps): update pyproject.toml * build(deps): update pyproject.toml --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
This is a suggested change. The goal is to make it easier to add model evaluations along with a dataset (as a kind of test).
This includes a few changes:
results/{model_name}/{task_results}