fix: override existing results #1617

Samoed · 2024-12-20T18:33:14Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Currently, if overwrite is passed with existing results, it won't overwrite them, and everything will be skipped.

This was the reason of #1584 (comment)

isaac-chung · 2024-12-20T23:41:53Z

mteb/evaluation/MTEB.py

+                    subsets_to_run = (
+                        info["missing_subsets"]
+                        if not overwrite_results
+                        else task_subsets


Judging from the test failures, maybe this condition needs to be reverted?

It seems, that test were incorrect a bit, because they use overwrite_results=True and not all splits were evaluated

overwrite_results shouldn't affect how splits are determined, so those tests should pass whether overwrite_results is True or False, right?

Yes, but in the test, last_evaluated_splits was being checked, and with overwrite_results, all selected splits and subsets should be reevaluated. This caused the test to fail. For example, instead of just val in last_evaluated_splits, it should include both val and test. However, the splits were selected correctly.

isaac-chung

I'd suggest adding a test case in missing splits/langs where overwrite_results=True, as that should not affect how splits and langs are selected.

isaac-chung · 2024-12-21T17:54:51Z

tests/test_evaluation/test_split_evaluation.py

@@ -245,7 +238,6 @@ def test_multilingual_one_missing_split_no_missing_lang(
        output_folder=str(tmp_path / "partial_langs_partial_splits"),
        verbosity=2,
        eval_subsets=["eng", "fra"],
-        overwrite_results=True,


This should not affect how missing splits or langs are selected.

Samoed · 2024-12-21T18:36:43Z

I've updated the test as you suggested. Previously, the problem was that when tasks were run a second time with the overwrite option, it only checked if one of the selected splits had been run. However, this behavior was incorrect, because all splits should be run with overwrite.

isaac-chung

Thanks for iterating! Got a few comments and questions.

isaac-chung · 2024-12-21T22:59:50Z

mteb/evaluation/MTEB.py

+            missing_evaluations = self._get_missing_evaluations(
+                existing_results,
+                task_eval_splits,
+                task_subsets,
+                eval_subsets,
+            )


if existing_results=None here, then won't missing_evaluations contain all specified splits? Maybe this call can be omitted or simplified to an assignment?

If both task_eval_splits and task_subsets are None, all splits will be selected, but filtering for splits and subsets is still necessary. I think calling the same function is easier to understand than adding more complex logic.

mteb/evaluation/MTEB.py

tests/test_evaluation/test_split_evaluation.py

isaac-chung · 2024-12-21T23:15:00Z

tests/test_evaluation/test_split_evaluation.py

@@ -305,3 +295,70 @@ def test_multilingual_one_missing_lang_in_one_split(
    # output merged result with previous results
    assert results[0].scores.keys() == {"test", "val"}
    assert len(results[0].scores["test"]) == 2
+
+
+def test_all_splits_evaluated_with_overwrite(model, tasks, tmp_path):


What does "all splits" mean here? There are test and val but only val is run here.

because all splits should be run with overwrite

Based on this comment, is it correct that "all" means all specified splits in the run method, and not all splits available in the task metadata's eval_splits?

Perhaps this can be spelled out, or preferably shown in additional test cases, where the 1st and 2nd runs have different eval_splits

means all specified splits in the run method,

Yes, all splits specified in evaluation.run

isaac-chung

Thanks for taking the time for my questions. I feel we're in a good place and can leave the rest for future improvement. Let's merge :)

Samoed · 2024-12-22T13:33:34Z

Thank you for review!

Samoed added 2 commits December 20, 2024 21:16

fix override existing results

e51a04c

lint

e5c19de

Samoed requested a review from isaac-chung December 20, 2024 18:33

isaac-chung reviewed Dec 20, 2024

View reviewed changes

Samoed added 3 commits December 21, 2024 12:23

fix tests

30e7443

add tests with overwrite

9634f5e

lint

15aa6a5

isaac-chung requested changes Dec 21, 2024

View reviewed changes

Samoed added 2 commits December 21, 2024 21:31

update tests

182338b

lint

e1e058a

isaac-chung reviewed Dec 21, 2024

View reviewed changes

Samoed added 2 commits December 22, 2024 14:57

update

1ec1d1c

lint

3941a04

isaac-chung approved these changes Dec 22, 2024

View reviewed changes

isaac-chung merged commit 272adb1 into main Dec 22, 2024
10 checks passed

isaac-chung deleted the fix_override_results branch December 22, 2024 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: override existing results #1617

fix: override existing results #1617

Samoed commented Dec 20, 2024

isaac-chung Dec 20, 2024

Samoed Dec 21, 2024 •

edited

Loading

isaac-chung Dec 21, 2024

Samoed Dec 22, 2024

isaac-chung left a comment

isaac-chung Dec 21, 2024

Samoed commented Dec 21, 2024

isaac-chung left a comment

isaac-chung Dec 21, 2024

Samoed Dec 22, 2024

isaac-chung Dec 21, 2024

Samoed Dec 22, 2024

isaac-chung left a comment

Samoed commented Dec 22, 2024

fix: override existing results #1617

fix: override existing results #1617

Conversation

Samoed commented Dec 20, 2024

Checklist

Choose a reason for hiding this comment

Samoed Dec 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isaac-chung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Samoed commented Dec 21, 2024

isaac-chung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isaac-chung left a comment

Choose a reason for hiding this comment

Samoed commented Dec 22, 2024

Samoed Dec 21, 2024 •

edited

Loading