Extend MIEB test coverage #1629

isaac-chung · 2024-12-24T22:22:30Z

Part of #1339. Extend the test coverage for mteb/abstasks/Image and mteb/evaluation/evaluators/Image from 30% to 69%.

pytest tests/test_tasks/test_mieb_datasets.py -n auto --durations=5 --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb/abstasks/Image --cov=mteb/evaluation/evaluators/Image

Before

---------- coverage: platform linux, python 3.12.1-final-0 -----------
Name                                                                       Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------
mteb/abstasks/Image/AbsTaskAny2AnyMultiChoice.py                             202    166    18%   36-63, 67-71, 80-109, 112-121, 124-138, 141-155, 158-186, 218-238, 248-276, 281-372, 375, 380, 383-413, 422-441, 451-462
mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py                               202    166    18%   35-62, 66-70, 79-108, 111-120, 123-137, 140-154, 157-185, 214-234, 244-272, 277-367, 370, 375, 378-408, 417-436, 446-457
mteb/abstasks/Image/AbsTaskAny2TextMultipleChoice.py                          27      9    67%   33, 38, 48-64
mteb/abstasks/Image/AbsTaskImageClassification.py                             78     50    36%   72, 77, 88-113, 124-186, 194-210
mteb/abstasks/Image/AbsTaskImageClustering.py                                 23      1    96%   38
mteb/abstasks/Image/AbsTaskImageMultilabelClassification.py                  101     68    33%   31-41, 81, 86, 97-122, 134-196, 200-210
mteb/abstasks/Image/AbsTaskImageTextPairClassification.py                     22      6    73%   34, 39, 49-58
mteb/abstasks/Image/AbsTaskVisualSTS.py                                       39     19    51%   43, 47, 52-66, 69, 74-84
mteb/abstasks/Image/AbsTaskZeroshotClassification.py                          25      1    96%   36
mteb/abstasks/Image/__init__.py                                                0      0   100%
mteb/evaluation/evaluators/Image/Any2AnyMultiChoiceEvaluator.py              223    181    19%   44-46, 49, 52-61, 65, 79-99, 111-240, 244-264, 277-287, 295-298, 321-413, 423-444, 452-467, 476-487
mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py                220    178    19%   44-46, 49, 52-62, 66, 81-102, 114-250, 254-274, 288-298, 305-308, 330-417, 427-448, 456-471, 480-491
mteb/evaluation/evaluators/Image/Any2TextMultipleChoiceEvaluator.py           50     34    32%   45-54, 61-99
mteb/evaluation/evaluators/Image/ClassificationEvaluator.py                  204    166    19%   29, 37-39, 42, 45-49, 53, 69-89, 92-138, 154-173, 176-234, 243-257, 266-278, 287-299, 315-334, 337-387
mteb/evaluation/evaluators/Image/ClusteringEvaluator.py                       38      2    95%   30-31
mteb/evaluation/evaluators/Image/ImageTextPairClassificationEvaluator.py      78     57    27%   25-28, 31, 34-52, 56, 83-90, 97-173
mteb/evaluation/evaluators/Image/VisualSTSEvaluator.py                        65     43    34%   28-30, 33, 36-40, 44, 56-64, 73-130
mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py           47      7    85%   32-36, 40, 64
mteb/evaluation/evaluators/Image/__init__.py                                   0      0   100%
--------------------------------------------------------------------------------------------------------
TOTAL                                                                       1644   1154    30%

After

---------- coverage: platform linux, python 3.12.1-final-0 -----------
Name                                                                       Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------
mteb/abstasks/Image/AbsTaskAny2AnyMultiChoice.py                             202     96    52%   48-61, 67-71, 81-84, 112-121, 132, 149, 166, 174, 219, 268, 289-291, 294-310, 342-370, 380, 383-413, 422-441, 451-462
mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py                               202     96    52%   47-60, 66-70, 80-83, 111-120, 131, 148, 165, 173, 215, 264, 285-287, 290-306, 337-365, 375, 378-408, 417-436, 446-457
mteb/abstasks/Image/AbsTaskAny2TextMultipleChoice.py                          27      2    93%   38, 50
mteb/abstasks/Image/AbsTaskImageClassification.py                             78      6    92%   77, 89, 102, 147, 157, 177
mteb/abstasks/Image/AbsTaskImageClustering.py                                 23      1    96%   38
mteb/abstasks/Image/AbsTaskImageMultilabelClassification.py                  101      6    94%   86, 98, 111, 170-174
mteb/abstasks/Image/AbsTaskImageTextPairClassification.py                     22      1    95%   39
mteb/abstasks/Image/AbsTaskVisualSTS.py                                       39      9    77%   74-84
mteb/abstasks/Image/AbsTaskZeroshotClassification.py                          25      1    96%   36
mteb/abstasks/Image/__init__.py                                                0      0   100%
mteb/evaluation/evaluators/Image/Any2AnyMultiChoiceEvaluator.py              223     59    74%   52-61, 65, 83, 99, 112, 123-124, 143-151, 173-174, 193-201, 234, 244-264, 296, 322-329, 365-366, 426-439, 476-487
mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py                220     62    72%   52-62, 66, 86, 102, 115, 126-127, 151-161, 191-218, 244, 254-274, 306, 331-338, 374-375, 430-443, 480-491
mteb/evaluation/evaluators/Image/Any2TextMultipleChoiceEvaluator.py           50      3    94%   47, 62, 75
mteb/evaluation/evaluators/Image/ClassificationEvaluator.py                  204    132    35%   29, 45-49, 53, 69-89, 92-138, 154-173, 176-234, 243-257, 266-278, 287-299, 319, 322, 370, 372, 382-383
mteb/evaluation/evaluators/Image/ClusteringEvaluator.py                       38      2    95%   30-31
mteb/evaluation/evaluators/Image/ImageTextPairClassificationEvaluator.py      78     13    83%   34-52, 56, 85, 98
mteb/evaluation/evaluators/Image/VisualSTSEvaluator.py                        65     12    82%   36-40, 44, 74, 114, 116-120, 123-124
mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py           47      7    85%   32-36, 40, 64
mteb/evaluation/evaluators/Image/__init__.py                                   0      0   100%
--------------------------------------------------------------------------------------------------------
TOTAL                                                                       1644    508    69%

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Cc @gowitheflow-1998

KennethEnevoldsen

[non-blocking]
How has the speed changed?

I am wondering if we couldn't solve this instead with MockTask to avoid the download and load time which I suspect is notable enough to care about (the objects are also great for testing with)

tests/test_benchmark/task_grid.py

isaac-chung added 2 commits December 24, 2024 22:17

add one task from each image AbsTask to test grid

a65a201

add visual sts to test grid

d2772fe

isaac-chung marked this pull request as ready for review December 24, 2024 22:42

isaac-chung requested a review from KennethEnevoldsen December 24, 2024 22:42

KennethEnevoldsen approved these changes Dec 25, 2024

View reviewed changes

tests/test_benchmark/task_grid.py Show resolved Hide resolved

isaac-chung merged commit c14c006 into mieb Dec 25, 2024
10 checks passed

isaac-chung deleted the extend-mieb-test-cov branch December 25, 2024 15:44

isaac-chung mentioned this pull request Jan 1, 2025

[mieb] Use mock abstask classes #1648

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend MIEB test coverage #1629

Extend MIEB test coverage #1629

isaac-chung commented Dec 24, 2024 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading

Extend MIEB test coverage #1629

Extend MIEB test coverage #1629

Conversation

isaac-chung commented Dec 24, 2024 • edited Loading

Checklist

KennethEnevoldsen left a comment • edited Loading

Choose a reason for hiding this comment

isaac-chung commented Dec 24, 2024 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading