Skip to content

Releases: embeddings-benchmark/mteb

1.20.0

21 Nov 19:04
Compare
Choose a tag to compare

1.20.0 (2024-11-21)

Feature

  • feat: add CUREv1 retrieval dataset (#1459)

  • feat: add CUREv1 dataset


Co-authored-by: nadshe <nadia.sheikh@clinia.com>
Co-authored-by: olivierr42 <olivier.rousseau@clinia.com>
Co-authored-by: Daniel Buades Marcos <daniel@buad.es>

  • feat: add missing domains to medical tasks

  • feat: modify benchmark tasks

  • chore: benchmark naming


Co-authored-by: nadshe <nadia.sheikh@clinia.com>
Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> (1cc6c9e)

Unknown

1.19.10

19 Nov 14:52
Compare
Choose a tag to compare

1.19.10 (2024-11-19)

Documentation

  • docs: Add sum per language for task counts (#1468)

  • add sum per lang

  • add sort by sum option

  • make lint (2fb6fe7)

Fix

1.19.9

17 Nov 22:46
Compare
Choose a tag to compare

1.19.9 (2024-11-17)

Fix

  • fix: swap touche2020 to maintain compatibility (#1469)

swap touche2020 for parity (9b2aece)

1.19.8

15 Nov 12:13
Compare
Choose a tag to compare

1.19.8 (2024-11-15)

Fix

  • fix: loading pre 1.11.0 (#1460)

  • small fix

  • fix: fix (1b920ac)

Unknown

  • WIP: Polishing up leaderboard UI (#1461)

  • fix: Removed column wrapping on the table, so that it remains readable

  • Added disclaimer to figure

  • fix: Added links to task info table, switched out license with metric (58c459b)

1.19.7

14 Nov 20:36
Compare
Choose a tag to compare

1.19.7 (2024-11-14)

Fix

  • fix: Fix load external results with None mteb_version (#1453)

  • fix

  • lint (14d7523)

1.19.6

14 Nov 14:17
Compare
Choose a tag to compare

1.19.6 (2024-11-14)

Fix

Unknown

  • Fixed task loading (#1451)

  • Fixed task result loading from disk

  • Fixed task result loading from disk (039d010)

  • Fix: Made data parsing in the leaderboard figure more robust (#1450)

Bugfixes with data parsing in main figure (4e86cea)

1.19.5

14 Nov 11:05
Compare
Choose a tag to compare

1.19.5 (2024-11-14)

Fix

  • fix: update task metadata to allow for null (#1448) (04ac3f2)

  • fix: Count unique texts, data leaks in calculate metrics (#1438)

  • add more stat

  • add more stat

  • update statistics (dd5d226)

Unknown

  • Update tasks table (f6a49fe)

  • Leaderboard: Fixed code benchmarks (#1441)

  • fixed code benchmarks

  • fix: Made n_parameters formatting smarter and more robust

  • fix: changed jina-embeddings-v3 number of parameters from 572K to 572M

  • fix: Fixed use_instuctions typo in model overview

  • fix: Fixed sentence-transformer compatibility switch

  • Ran linting

  • Added all languages, tasks, types and domains to options

  • Removed resetting options when a new benchmark is selected

  • All results now get displayed, but models that haven't been run on everything get nan values in the table (3a1a470)

  • Leaderboard 2.0: added performance x n_parameters plot + more benchmark info (#1437)

  • Added elementary speed/performance plot

  • Refactored table formatting code

  • Bumped Gradio version

  • Added more general info to benchmark description markdown block

  • Adjusted margin an range on plot

  • Made hover information easier to read on plot

  • Made range scaling dynamic in plot

  • Moved citation next to benchmark description

  • Made titles in benchmark info bold (76c2112)

1.19.4

11 Nov 09:18
Compare
Choose a tag to compare

1.19.4 (2024-11-11)

Fix

  • fix: Add missing benchmarks in benchmarks.py (#1431)

Fixes #1423 (a240ea0)

  • fix: Add Korean AutoRAGRetrieval (#1388)

  • feat: add AutoRAG Korean embedding retrieval benchmark

  • fix: run --- 🧹 Running linters ---
    ruff format . # running ruff formatting
    716 files left unchanged
    ruff check . --fix # running ruff linting
    All checks passed!

  • fix: add metadata for AutoRAGRetrieval

  • change link for markers_bm

  • add AutoRAGRetrieval to init.py and update metadata

  • add precise metadata

  • update metadata: description and license

  • delete descriptive_stats in AutoRAGRetrieval.py and run calculate_matadata_metrics.py (f79d9ba)

  • fix: make samples_per_label a task attribute (#1419)

make samples_per_label a task attr (7f1a1d3)

Unknown

1.19.3

11 Nov 08:33
Compare
Choose a tag to compare

1.19.3 (2024-11-11)

Documentation

  • docs: Fix a typo in README (#1430)

Fix typo in readme (9681eb3)

  • docs: Update recommendation for pushing results (#1401)

fix: Update recommendation for pushing results (fccf034)

Fix

  • fix: add logging for RetrievalEvaluator NaN values for similarity scores (#1398)

Fixes #1389 (cc7a106)

1.19.2

07 Nov 21:30
Compare
Choose a tag to compare

1.19.2 (2024-11-07)

Fix

  • fix: Added the necessary trust_remote_code (#1406) (fd8b283)