Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLA-546][external] Removal of V1 code #777

Merged
merged 9 commits into from
Feb 20, 2024
Merged

[PLA-546][external] Removal of V1 code #777

merged 9 commits into from
Feb 20, 2024

Conversation

JBWilkie
Copy link
Collaborator

@JBWilkie JBWilkie commented Feb 4, 2024

Problem

darwin-py contains lots of code intended to work exclusively with the V1 version of the platform. This includes:

  • Exporting in Darwin JSON V1
  • Converting to Darwin JSON V2
  • The RemoteDatasetV1 & UploadHandlerV1 objects
  • Lots of Client object functions that work with deprecated V1 API endpoints
  • A large number of unit tests written for V1 functions

Solution

This code will never be used again, and we are deprecating the Darwin JSON V1 JSON format. Therefore this PR is to remove the above code. Also refactored a lot of V1 tests to be their V2 equivalents if such V2 tests didn't yet exist

Changelog

  • Removed all Darwin JSON V1 related code (exporting & converting)
  • Removed all code relating to V1 platform or API concepts

@JBWilkie
Copy link
Collaborator Author

JBWilkie commented Feb 4, 2024

This is an enormous PR. To help with it's review, every change falls is either:

  • Removal of logic that checks V1 against V2, or that accounts for V1 JSON
  • Removal of a V1-specific object: RemoteDatasetV1 etc.
  • Removal of a V1-specific function: There were lots of these on Client
  • Removal or refactoring of V1 unit tests to work for the equivalent V2 case

There should be no major functional changes in this PR. Additionally, I have almost certainly missed at least some V1 related code

@saurbhc saurbhc self-requested a review February 20, 2024 14:39
@saurbhc
Copy link
Member

saurbhc commented Feb 20, 2024

i ran e2e_tests, and saw a test failing on darwin convert coco

E2E test output:
$ pytest e2e_tests/
Setting up data
Using prefix hqcltp
Setting up annotation classes
Sleeping for 10 seconds to allow the server to catch up
============================================================================================== test session starts ==============================================================================================
platform darwin -- Python 3.8.18, pytest-7.4.3, pluggy-1.3.0
cachedir: /tmp/pytest_cache
rootdir: /Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests
configfile: pytest.ini
plugins: rerunfailures-12.0
collected 16 items                                                                                                                                                                                              

e2e_tests/test_darwin.py .....                                                                                                                                                                            [ 31%]
e2e_tests/test_example.py x                                                                                                                                                                               [ 37%]
e2e_tests/cli/convert/test_convert.py ..F.                                                                                                                                                                [ 62%]
e2e_tests/sdk/future/core/test_properties.py ......                                                                                                                                                       [100%]
Tearing down datasets
Tearing down workflows
Tearing down general datasets
Tearing down data complete


=================================================================================================== FAILURES ====================================================================================================
_____________________________________________________________________ TestExportCli.test_darwin_convert[coco-input_path2-expectation_path2] _____________________________________________________________________

self = <e2e_tests.cli.convert.test_convert.TestExportCli object at 0x11fb6de20>, format = 'coco', input_path = PosixPath('/Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/from')
expectation_path = PosixPath('/Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/to')
tmp_path = PosixPath('/private/var/folders/bq/vhpf4_td0lzcdhlxyl3cpybr0000gn/T/pytest-of-saurabhchopra/pytest-2/test_darwin_convert_coco_input0')

    @pytest.mark.parametrize(
        "format, input_path, expectation_path",
        [
            ("yolo_segmented", data_path / "yolov8/from", data_path / "yolov8/to"),
            ("yolo", data_path / "yolo/from", data_path / "yolo/to"),
            pytest.param(
                "coco",
                data_path / "coco/from",
                data_path / "coco/to",
                marks=pytest.mark.skipif(
                    sys.platform == "win32",
                    reason="File paths are different on Windows, leading to test failure",
                ),
            ),
            (
                "semantic_mask",
                data_path / "semantic_mask/from",
                data_path / "semantic_mask/to",
            ),
        ],
    )
    def test_darwin_convert(
        self, format: str, input_path: Path, expectation_path: Path, tmp_path: Path
    ) -> None:
        """
        Test converting a file format to another format
        """
        assert (
            input_path is not None and expectation_path is not None
        ), "Input or expectation path is None"
        assert (
            input_path.exists() and expectation_path.exists()
        ), f"Input path {input_path.absolute()} or expectation path {expectation_path.absolute()} does not exist"
        assert (
            input_path.is_dir() and expectation_path.is_dir()
        ), f"Input path {input_path.absolute()} or expectation path {expectation_path.absolute()} is not a directory"
    
        result = run_cli_command(
            f"darwin convert {format} {str(input_path)} {str(tmp_path)}"
        )
        if format == "coco":
            self.patch_coco(tmp_path / "output.json")
        assert_cli(result, 0)
>       self.compare_directories(expectation_path, tmp_path)

e2e_tests/cli/convert/test_convert.py:93: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <e2e_tests.cli.convert.test_convert.TestExportCli object at 0x11fb6de20>, path = PosixPath('/Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/to')
expected_path = PosixPath('/private/var/folders/bq/vhpf4_td0lzcdhlxyl3cpybr0000gn/T/pytest-of-saurabhchopra/pytest-2/test_darwin_convert_coco_input0')

    def compare_directories(self, path: Path, expected_path: Path) -> None:
        """
        Compare two directories recursively
        """
        assert path.exists() and expected_path.exists()
        assert path.is_dir() and expected_path.is_dir()
    
        for file in path.iterdir():
            if file.is_dir():
                # Recursively compare directories
                self.compare_directories(file, expected_path / file.name)
            else:
                if file.name.startswith("."):
                    # Ignore hidden files
                    continue
    
                # Compare files
                with file.open("r") as f:
                    content = f.read()
    
                with Path(expected_path / file.name).open() as f:
                    expected_content = f.read()
    
                if content != expected_content:
                    print(f"Expected file: {expected_path / file.name}")
                    print(f"Expected Content: \n{expected_content}")
                    print("---------------------")
                    print(f"Actual file: {file}")
                    print(f"Actual Content: \n{content}")
>                   assert False, f"File {file} does not match expected file"
E                   AssertionError: File /Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/to/output.json does not match expected file
E                   assert False

e2e_tests/cli/convert/test_convert.py:48: AssertionError
--------------------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------------------
Expected file: /private/var/folders/bq/vhpf4_td0lzcdhlxyl3cpybr0000gn/T/pytest-of-saurabhchopra/pytest-2/test_darwin_convert_coco_input0/output.json
Expected Content: 
{
  "info": {
    "description": "Exported from Darwin",
    "url": "n/a",
    "version": "n/a",
    "year": 2023,
    "contributor": "n/a",
    "date_created": "2023/12/05"
  },
  "licenses": [
    {
      "url": "n/a",
      "id": 0,
      "name": "placeholder license"
    }
  ],
  "images": [
    {
      "license": 0,
      "file_name": "",
      "coco_url": "n/a",
      "height": null,
      "width": null,
      "date_captured": "",
      "flickr_url": "n/a",
      "darwin_url": null,
      "darwin_workview_url": null,
      "id": 2043925204,
      "tag_ids": []
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 2043925204,
      "category_id": 348813479,
      "segmentation": [
        [
          0.0,
          0.0,
          1.0,
          0.0,
          1.0,
          1.0,
          0.0,
          1.0
        ]
      ],
      "area": 1.0,
      "bbox": [
        0.0,
        0.0,
        1.0,
        1.0
      ],
      "iscrowd": 0,
      "extra": {}
    },
    {
      "id": 3,
      "image_id": 2043925204,
      "category_id": 3961009249,
      "segmentation": [
        [
          0.0,
          0.0,
          1.0,
          0.0,
          1.0,
          1.0,
          0.0,
          1.0
        ]
      ],
      "area": 1.0,
      "bbox": [
        0.0,
        0.0,
        1.0,
        1.0
      ],
      "iscrowd": 0,
      "extra": {}
    }
  ],
  "categories": [
    {
      "id": 348813479,
      "name": "test_bb",
      "supercategory": "root"
    },
    {
      "id": 3961009249,
      "name": "test_poly",
      "supercategory": "root"
    }
  ],
  "tag_categories": []
}
---------------------
Actual file: /Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/to/output.json
Actual Content: 
{
  "info": {
    "description": "Exported from Darwin",
    "url": "n/a",
    "version": "n/a",
    "year": 2023,
    "contributor": "n/a",
    "date_created": "2023/12/05"
  },
  "licenses": [
    {
      "url": "n/a",
      "id": 0,
      "name": "placeholder license"
    }
  ],
  "images": [
    {
      "license": 0,
      "file_name": "",
      "coco_url": "n/a",
      "height": null,
      "width": null,
      "date_captured": "",
      "flickr_url": "n/a",
      "darwin_url": null,
      "darwin_workview_url": null,
      "id": 2043925204,
      "tag_ids": []
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 2043925204,
      "category_id": 348813479,
      "segmentation": [
        [
          0.0,
          0.0,
          1.0,
          0.0,
          1.0,
          1.0,
          0.0,
          1.0
        ]
      ],
      "area": 1.0,
      "bbox": [
        0.0,
        0.0,
        1.0,
        1.0
      ],
      "iscrowd": 0,
      "extra": {}
    },
    {
      "id": 3,
      "image_id": 2043925204,
      "category_id": 3961009249,
      "segmentation": [
        [
          0.0,
          0.0,
          1.0,
          0.0,
          1.0,
          1.0,
          0.0,
          1.0
        ]
      ],
      "area": 1.0,
      "bbox": [
        0.0,
        0.0,
        1.0,
        1.0
      ],
      "iscrowd": 0,
      "extra": {}
    }
  ],
  "categories": [
    {
      "id": 3961009249,
      "name": "test_poly",
      "supercategory": "root"
    },
    {
      "id": 348813479,
      "name": "test_bb",
      "supercategory": "root"
    }
  ],
  "tag_categories": []
}
=============================================================================================== warnings summary ================================================================================================
.venv/lib/python3.8/site-packages/mpire/dashboard/dashboard.py:8
  /Users/saurabhchopra/github/v7labs/darwin-py/.venv/lib/python3.8/site-packages/mpire/dashboard/dashboard.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import resource_string

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info ============================================================================================
FAILED e2e_tests/cli/convert/test_convert.py::TestExportCli::test_darwin_convert[coco-input_path2-expectation_path2] - AssertionError: File /Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests/data/coco/to/output.json does not match expected file
======================================================================== 1 failed, 14 passed, 1 xfailed, 1 warning in 202.74s (0:03:22) =========================================================================

looks like a flaky test, as i re-ran the tests and it didn't fail 🤷

E2E test output 2:
$ pytest e2e_tests/
Setting up data
Using prefix 2w0so0
Setting up annotation classes
Sleeping for 10 seconds to allow the server to catch up
============================================================================================== test session starts ==============================================================================================
platform darwin -- Python 3.8.18, pytest-7.4.3, pluggy-1.3.0
cachedir: /tmp/pytest_cache
rootdir: /Users/saurabhchopra/github/v7labs/darwin-py/e2e_tests
configfile: pytest.ini
plugins: rerunfailures-12.0
collected 16 items                                                                                                                                                                                              

e2e_tests/test_darwin.py .....                                                                                                                                                                            [ 31%]
e2e_tests/test_example.py x                                                                                                                                                                               [ 37%]
e2e_tests/cli/convert/test_convert.py ....                                                                                                                                                                [ 62%]
e2e_tests/sdk/future/core/test_properties.py ......                                                                                                                                                       [100%]
Tearing down datasets
Tearing down workflows
Tearing down general datasets
Tearing down data complete


=============================================================================================== warnings summary ================================================================================================
.venv/lib/python3.8/site-packages/mpire/dashboard/dashboard.py:8
  /Users/saurabhchopra/github/v7labs/darwin-py/.venv/lib/python3.8/site-packages/mpire/dashboard/dashboard.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import resource_string

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================= 15 passed, 1 xfailed, 1 warning in 203.54s (0:03:23) ==============================================================================

Copy link
Member

@saurbhc saurbhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 Thank you for doing this!

darwin/importer/importer.py Outdated Show resolved Hide resolved
Co-authored-by: saurbhc <sc@saurabhchopra.co.uk>
@JBWilkie JBWilkie merged commit d86b0e4 into master Feb 20, 2024
16 checks passed
@saurbhc saurbhc deleted the DAR-415 branch February 20, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants