Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLA-595] Don't create sub-annotation keyframe for every keyframe when importing annotations #779

Merged
merged 2 commits into from
Feb 19, 2024

Conversation

saurbhc
Copy link
Member

@saurbhc saurbhc commented Feb 19, 2024

Problem

As of now, when importing annotations, we automatically make every keyframe a "sub-annotation keyframe" by marking that the sub-annotation has changed.

Solution

during import, between D-Py and backend we still use this format which include both annotation and sub-annotation data at the same time, for each "keyframe". It would be kinda-possible to drop sub-annotation data from the request whenever we know it didn't change since last one, which likely wouldn't create on backend side sub-annotation keyframe.

Changelog

Don't create sub-annotation keyframe for every keyframe when importing annotations

@saurbhc saurbhc added the enhancement New feature or request label Feb 19, 2024
@saurbhc saurbhc self-assigned this Feb 19, 2024
Copy link
Collaborator

@JBWilkie JBWilkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and LGTM - Duplicate sub-annotations (those that haven't changed since the last change) are removed the import payload correctly. However, this doesn't appear to accomplish the overall goal of PLA-595, which makes me think it may require backend work

Comment on lines +1022 to +1047
def _handle_video_annotation_subs(annotation: dt.VideoAnnotation):
"""
Remove duplicate sub-annotations from the VideoAnnotation.annotation(s) to be imported.
"""
last_subs = None
for _, _annotation in annotation.frames.items():
_annotation: dt.Annotation
subs = []
for sub in _annotation.subs:
if last_subs is not None and all(
any(
last_sub.annotation_type == sub.annotation_type
and last_sub.data == sub.data
for last_sub in last_subs
)
for sub in _annotation.subs
):
# drop sub-annotation whenever we know it didn't change since last one
# which likely wouldn't create on backend side sub-annotation keyframe.
# this is a workaround for the backend not handling duplicate sub-annotations.
continue
subs.append(sub)
last_subs = _annotation.subs
_annotation.subs = subs


Copy link
Collaborator

@JBWilkie JBWilkie Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saurbhc I know I approved this, but I found an issue in the case that there is >1 sub-annotation per annotation. I made this change to support those cases. Could you please take a look and verify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JBWilkie thank you for updating the code 💯

The code is already quite efficient, but there's a small optimisation you can make, I understand the intent to use all() and any() which both iterate over all elements -- you can use a set to store the previous sub-annotations and check if the current sub-annotation is in the set? something like:

last_subs = set()
for _, _annotation in annotation.frames.items():
    _annotation: dt.Annotation
    subs = []
    for sub in _annotation.subs:
        sub_tuple = (sub.annotation_type, sub.data)
        if sub_tuple in last_subs:
            # drop sub-annotation whenever we know it didn't change since last one
            # which likely wouldn't create on backend side sub-annotation keyframe.
            # this is a workaround for the backend not handling duplicate sub-annotations.
            continue
        subs.append(sub)
        last_subs.add(sub_tuple)
    _annotation.subs = subs
  • last_subs is a set that stores tuples of (sub.annotation_type, sub.data)
  • This should be faster because checking if an element is in a set is an O(1) operation, while all() and any() are O(n) operations.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wait!

I just learned that if the sub-annotation (sub.data) is a list of attributes, then sub.data will be a list which is un-hashable. we can convert it to tuple first like

sub_tuple = (sub.annotation_type, tuple(sub.data))

but it already seems a bit complicated and we might need to handle it for other datatypes too.

Let's Ignore this optimization bit! Thank you!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a shame about the list of attributes, otherwise this is a very nice change and makes it much more readable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thank you for your work here @saurbhc! Please merge whenever :)

@saurbhc saurbhc merged commit 9269e67 into master Feb 19, 2024
19 checks passed
@saurbhc saurbhc deleted the PLA-595 branch February 19, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants