Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend MIEB test coverage #1629

Merged
merged 2 commits into from
Dec 25, 2024
Merged

Extend MIEB test coverage #1629

merged 2 commits into from
Dec 25, 2024

Conversation

isaac-chung
Copy link
Collaborator

@isaac-chung isaac-chung commented Dec 24, 2024

Part of #1339. Extend the test coverage for mteb/abstasks/Image and mteb/evaluation/evaluators/Image from 30% to 69%.

pytest tests/test_tasks/test_mieb_datasets.py -n auto --durations=5 --cov-report=term-missing --cov-config=pyproject.toml --cov=mteb/abstasks/Image --cov=mteb/evaluation/evaluators/Image
Before
---------- coverage: platform linux, python 3.12.1-final-0 -----------
Name                                                                       Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------
mteb/abstasks/Image/AbsTaskAny2AnyMultiChoice.py                             202    166    18%   36-63, 67-71, 80-109, 112-121, 124-138, 141-155, 158-186, 218-238, 248-276, 281-372, 375, 380, 383-413, 422-441, 451-462
mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py                               202    166    18%   35-62, 66-70, 79-108, 111-120, 123-137, 140-154, 157-185, 214-234, 244-272, 277-367, 370, 375, 378-408, 417-436, 446-457
mteb/abstasks/Image/AbsTaskAny2TextMultipleChoice.py                          27      9    67%   33, 38, 48-64
mteb/abstasks/Image/AbsTaskImageClassification.py                             78     50    36%   72, 77, 88-113, 124-186, 194-210
mteb/abstasks/Image/AbsTaskImageClustering.py                                 23      1    96%   38
mteb/abstasks/Image/AbsTaskImageMultilabelClassification.py                  101     68    33%   31-41, 81, 86, 97-122, 134-196, 200-210
mteb/abstasks/Image/AbsTaskImageTextPairClassification.py                     22      6    73%   34, 39, 49-58
mteb/abstasks/Image/AbsTaskVisualSTS.py                                       39     19    51%   43, 47, 52-66, 69, 74-84
mteb/abstasks/Image/AbsTaskZeroshotClassification.py                          25      1    96%   36
mteb/abstasks/Image/__init__.py                                                0      0   100%
mteb/evaluation/evaluators/Image/Any2AnyMultiChoiceEvaluator.py              223    181    19%   44-46, 49, 52-61, 65, 79-99, 111-240, 244-264, 277-287, 295-298, 321-413, 423-444, 452-467, 476-487
mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py                220    178    19%   44-46, 49, 52-62, 66, 81-102, 114-250, 254-274, 288-298, 305-308, 330-417, 427-448, 456-471, 480-491
mteb/evaluation/evaluators/Image/Any2TextMultipleChoiceEvaluator.py           50     34    32%   45-54, 61-99
mteb/evaluation/evaluators/Image/ClassificationEvaluator.py                  204    166    19%   29, 37-39, 42, 45-49, 53, 69-89, 92-138, 154-173, 176-234, 243-257, 266-278, 287-299, 315-334, 337-387
mteb/evaluation/evaluators/Image/ClusteringEvaluator.py                       38      2    95%   30-31
mteb/evaluation/evaluators/Image/ImageTextPairClassificationEvaluator.py      78     57    27%   25-28, 31, 34-52, 56, 83-90, 97-173
mteb/evaluation/evaluators/Image/VisualSTSEvaluator.py                        65     43    34%   28-30, 33, 36-40, 44, 56-64, 73-130
mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py           47      7    85%   32-36, 40, 64
mteb/evaluation/evaluators/Image/__init__.py                                   0      0   100%
--------------------------------------------------------------------------------------------------------
TOTAL                                                                       1644   1154    30%

After
---------- coverage: platform linux, python 3.12.1-final-0 -----------
Name                                                                       Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------------------------------
mteb/abstasks/Image/AbsTaskAny2AnyMultiChoice.py                             202     96    52%   48-61, 67-71, 81-84, 112-121, 132, 149, 166, 174, 219, 268, 289-291, 294-310, 342-370, 380, 383-413, 422-441, 451-462
mteb/abstasks/Image/AbsTaskAny2AnyRetrieval.py                               202     96    52%   47-60, 66-70, 80-83, 111-120, 131, 148, 165, 173, 215, 264, 285-287, 290-306, 337-365, 375, 378-408, 417-436, 446-457
mteb/abstasks/Image/AbsTaskAny2TextMultipleChoice.py                          27      2    93%   38, 50
mteb/abstasks/Image/AbsTaskImageClassification.py                             78      6    92%   77, 89, 102, 147, 157, 177
mteb/abstasks/Image/AbsTaskImageClustering.py                                 23      1    96%   38
mteb/abstasks/Image/AbsTaskImageMultilabelClassification.py                  101      6    94%   86, 98, 111, 170-174
mteb/abstasks/Image/AbsTaskImageTextPairClassification.py                     22      1    95%   39
mteb/abstasks/Image/AbsTaskVisualSTS.py                                       39      9    77%   74-84
mteb/abstasks/Image/AbsTaskZeroshotClassification.py                          25      1    96%   36
mteb/abstasks/Image/__init__.py                                                0      0   100%
mteb/evaluation/evaluators/Image/Any2AnyMultiChoiceEvaluator.py              223     59    74%   52-61, 65, 83, 99, 112, 123-124, 143-151, 173-174, 193-201, 234, 244-264, 296, 322-329, 365-366, 426-439, 476-487
mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py                220     62    72%   52-62, 66, 86, 102, 115, 126-127, 151-161, 191-218, 244, 254-274, 306, 331-338, 374-375, 430-443, 480-491
mteb/evaluation/evaluators/Image/Any2TextMultipleChoiceEvaluator.py           50      3    94%   47, 62, 75
mteb/evaluation/evaluators/Image/ClassificationEvaluator.py                  204    132    35%   29, 45-49, 53, 69-89, 92-138, 154-173, 176-234, 243-257, 266-278, 287-299, 319, 322, 370, 372, 382-383
mteb/evaluation/evaluators/Image/ClusteringEvaluator.py                       38      2    95%   30-31
mteb/evaluation/evaluators/Image/ImageTextPairClassificationEvaluator.py      78     13    83%   34-52, 56, 85, 98
mteb/evaluation/evaluators/Image/VisualSTSEvaluator.py                        65     12    82%   36-40, 44, 74, 114, 116-120, 123-124
mteb/evaluation/evaluators/Image/ZeroshotClassificationEvaluator.py           47      7    85%   32-36, 40, 64
mteb/evaluation/evaluators/Image/__init__.py                                   0      0   100%
--------------------------------------------------------------------------------------------------------
TOTAL                                                                       1644    508    69%

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Cc @gowitheflow-1998

@isaac-chung isaac-chung marked this pull request as ready for review December 24, 2024 22:42
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking]
How has the speed changed?

I am wondering if we couldn't solve this instead with MockTask to avoid the download and load time which I suspect is notable enough to care about (the objects are also great for testing with)

tests/test_benchmark/task_grid.py Show resolved Hide resolved
@isaac-chung isaac-chung merged commit c14c006 into mieb Dec 25, 2024
10 checks passed
@isaac-chung isaac-chung deleted the extend-mieb-test-cov branch December 25, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants