1.1.1 C-MTEB. PL-MTEB, Multi-GPU
Updates
- π¨π³ C-MTEB was released and integrated thanks to @staoxiao. Check out the paper here. Together with C-MTEB, the team also released other great embedding resources such as new SoTA models on MTEB & C-MTEB called BGE, as well as datasets and source code π
- π΅π± PL-MTEB & BEIR-PL was released and integrated thanks to @rafalposwiata & @kwojtasi. Check out the new leaderboard tab for PL-MTEB: https://huggingface.co/spaces/mteb/leaderboard. Some BEIR-PL datasets are still missing and will be added soon cc @kwojtasi π
- π» Clarifications on multi-GPU: Native multi-GPU support for Retrieval thanks to @NouamaneTazi. We also added a clarification in the README on how any task can be run in a multi-GPU setup without requiring any changes in MTEB. MTEB abstracts the way the encodings are produced. Whether users use multiple or a single GPU in the
encode
function is completely flexible π
What's Changed
- Code cleanup by @NouamaneTazi in #131
- Replaced prints with logging by @KennethEnevoldsen in #133
- Add BEIR-PL datasets to MTEB by @kwojtasi in #121
- Add Polish tasks (PL-MTEB) by @rafalposwiata in #137
- Add Chinese tasks (C-MTEB) by @staoxiao in #134
- Support Multi-node Evaluation by @NouamaneTazi in #132
- Add multi gpu eval to readme by @NouamaneTazi in #140
- Default to false by @Muennighoff in #143
- Rely on standard encode kwargs only by @Muennighoff in #145
- Fix splits by @Muennighoff in #149
- fix: add missing task-langs attribute by @guenthermi in #152
- Clarify multi-gpu usage by @Muennighoff in #153
- Simplify code snippets by @Muennighoff in #154
- fix: msmarco-v2 uses dev.tsv, not dev1.tsv by @garrett361 in #155
- Fix eval langs by @Muennighoff in #157
New Contributors
- @kwojtasi made their first contribution in #121
- @rafalposwiata made their first contribution in #137
- @staoxiao made their first contribution in #134
- @guenthermi made their first contribution in #152
- @garrett361 made their first contribution in #155
Full Changelog: 1.1.0...1.1.1