Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
KennethEnevoldsen committed Mar 25, 2024
2 parents b42abe4 + a16eb07 commit c9d1a03
Show file tree
Hide file tree
Showing 162 changed files with 834 additions and 675 deletions.
21 changes: 21 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

<!-- If you are not submitting for a dataset, feel free to remove the content below -->


<!-- add additonal description, question etc. related to the new dataset -->

## Checklist for adding MMTEB dataset
<!--
Before you commit here is a checklist you should complete before submitting
if you are not
-->

- [ ] I have tested that the dataset runs with the `mteb` package.
- [ ] I have run the following models on the task (adding the results to the pr). These can be run using the `mteb run -m {model_name} -t {task_name}` command.
- [ ] `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`
- [ ] `intfloat/multilingual-e5-small`
- [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
- [ ] I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)
- [ ] Run tests locally to make sure nothing is broken using `make test`.
- [ ] Run the formatter to format the code using `make lint`.
- [ ] I have added points for my submission to the [POINTS.md](https://github.com/embeddings-benchmark/mteb/tree/main/docs/mmteb/POINTS.md) file.
File renamed without changes.
1 change: 0 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
pytest:
Expand Down
5 changes: 5 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"recommendations": [
"charliermarsh.ruff"
]
}
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,8 @@ test-parallel:
@echo "--- 🧪 Running tests ---"
@echo "Note that parallel tests can sometimes cause issues with some tests."
pytest -n auto --dist=loadfile -s -v

pr:
@echo "--- 🚀 Running requirements for a PR ---"
make lint
make test-parallel
52 changes: 21 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "average_word_embeddings_komninos"
# or directly from huggingface:
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

model = SentenceTransformer(model_name)
evaluation = MTEB(tasks=["Banking77Classification"])
Expand Down Expand Up @@ -131,15 +133,15 @@ Models should implement the following interface, implementing an `encode` functi

```python
class MyModel():
def encode(self, sentences, batch_size=32, **kwargs):
def encode(self, sentences: list[str], **kwargs) -> list[np.ndarray] | list[torch.Tensor]:
"""
Returns a list of embeddings for the given sentences.
Args:
sentences (`List[str]`): List of sentences to encode
batch_size (`int`): Batch size for the encoding
sentences: List of sentences to encode
Returns:
`List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
List of embeddings for the given sentences
"""
pass

Expand All @@ -152,64 +154,48 @@ If you'd like to use different encoding functions for query and corpus when eval

```python
class MyModel():
def encode_queries(self, queries, batch_size=32, **kwargs):
def encode_queries(self, queries: list[str], **kwargs) -> list[np.ndarray] | list[torch.Tensor]:
"""
Returns a list of embeddings for the given sentences.
Args:
queries (`List[str]`): List of sentences to encode
batch_size (`int`): Batch size for the encoding
queries: List of sentences to encode
Returns:
`List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
List of embeddings for the given sentences
"""
pass

def encode_corpus(self, corpus, batch_size=32, **kwargs):
def encode_corpus(self, corpus: list[str] | list[dict[str, str]], **kwargs) -> list[np.ndarray] | list[torch.Tensor]:
"""
Returns a list of embeddings for the given sentences.
Args:
corpus (`List[str]` or `List[Dict[str, str]]`): List of sentences to encode
corpus: List of sentences to encode
or list of dictionaries with keys "title" and "text"
batch_size (`int`): Batch size for the encoding
Returns:
`List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
List of embeddings for the given sentences
"""
pass
```

### Evaluating on a custom task
### Evaluating on a custom dataset

To add a new task, you need to implement a new class that inherits from the `AbsTask` associated with the task type (e.g. `AbsTaskReranking` for reranking tasks). You can find the supported task types in [here](https://github.com/embeddings-benchmark/mteb-draft/tree/main/mteb/abstasks).
To evaluate on a custom task, you can run the following code on your custom task. See [how to add a new task](docs/adding_a_dataset.md), for how to create a new task in MTEB.

```python
from mteb import MTEB
from mteb.abstasks.AbsTaskReranking import AbsTaskReranking
from sentence_transformers import SentenceTransformer


class MindSmallReranking(AbsTaskReranking):
@property
def description(self):
return {
"name": "MindSmallReranking",
"hf_hub_name": "mteb/mind_small",
"description": "Microsoft News Dataset: A Large-Scale English Dataset for News Recommendation Research",
"reference": "https://www.microsoft.com/en-us/research/uploads/prod/2019/03/nl4se18LinkSO.pdf",
"type": "Reranking",
"category": "s2s",
"eval_splits": ["validation"],
"eval_langs": ["en"],
"main_score": "map",
}
class MyCustomTask(AbsTaskReranking):
...

model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=[MindSmallReranking()])
evaluation = MTEB(tasks=[MyCustomTask()])
evaluation.run(model)
```

> **Note:** for multilingual tasks, make sure your class also inherits from the `MultilingualTask` class like in [this](https://github.com/embeddings-benchmark/mteb-draft/blob/main/mteb/tasks/Classification/MTOPIntentClassification.py) example.
</details>

<br />
Expand All @@ -221,12 +207,16 @@ evaluation.run(model)
| 📋 [Tasks] | Overview of available tasks |
| 📈 [Leaderboard] | The interactive leaderboard of the benchmark |
| 🤖 [Adding a model] | Information related to how to submit a model to the leaderboard |
| 👩‍💻 [Adding a dataset] | How to add a new task/dataset to MTEB | 
| 🤝 [Contributing] | How to contribute to MTEB and set it up for development |
<!-- | 🌐 [MMTEB] | An open-source effort to extend MTEB to cover a broad set of languages |   -->

[Tasks]: docs/tasks.md
[Contributing]: docs/contributing.md
[Adding a model]: docs/adding_a_model.md
[Adding a task]: docs/adding_a_dataset.md
[Leaderboard]: https://huggingface.co/spaces/mteb/leaderboard
[MMTEB]: docs/mmteb/readme.md

## Citing

Expand Down
Loading

0 comments on commit c9d1a03

Please sign in to comment.