Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAR-1679][External] Made darwin-py ignore the .v7/metadata.json properties manifest when reading JSON annotation files #823

Merged
merged 6 commits into from
May 3, 2024

Conversation

JBWilkie
Copy link
Collaborator

@JBWilkie JBWilkie commented Apr 22, 2024

Problem

In a previous bug fix (DAR-1609) it was discovered that pulling a dataset release does not include the properties metadata file if one needs to be downloaded. This was fixed in that ticket. However, there are several functions in darwin-py that assume every JSON file in the annotations directory is annotation file. If there's a properties metadata file, this is no longer true, and this causes issues

Solution

  • 1: Identified everywhere where darwin-py reads annotation files from the annotations directory
  • 2: Ensure we ignore the .v7/metadata.json properties manifest in each case

Changelog

Made darwin-py ignore the .v7/metadata.json properties manifest when reading JSON annotation files

@JBWilkie JBWilkie requested a review from saurbhc April 22, 2024 11:58
@JBWilkie
Copy link
Collaborator Author

Since we're doing the same glob() statement repeatedly with logic to avoid /.v7/ in filepaths, I can abstract this into a function and then add a unit test if it's deemed worthwhile

return (
str(e)
for e in sorted(annotations_dir.glob("**/*.json"))
if "/.v7/" not in str(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            if "/.v7/" not in str(e)

Do we need this part here? 🤔 as glob("**/*.json") is anyways fetching only the json files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do sadly, and I can reproduce this by instantiating a LocalDataset - The ** in glob() is recursively checking the annotations directory. Since it now will contain .v7/metadata.json, it picks this file up unless we tell it to ignore filepaths containing /.v7/

The code I used to reproduce this is below - Before running it, a dataset release has to be pulled using 0.8.59 so that the metadata file is present in the annotations directory

from pathlib import Path

from darwin.dataset import LocalDataset

my_dataset = LocalDataset(
    dataset_path=Path("/path/to/dataset"),
    annotation_type="bounding_box",
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants