Export dataset in CVAT format misses frames in tasks with non-default… #8662

bsekachev · 2024-11-07T14:59:17Z

… frame step

Motivation and context

How has this been tested?

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

Bug Fixes
- Resolved an issue where dataset exports in CVAT format were missing frames when using non-default frame steps.
Improvements
- Enhanced the efficiency of dataset export operations by optimizing how updated timestamps are retrieved.
- Improved error handling for export cache management with more specific exception handling.
Changes
- Modified frame iteration logic to allow for broader frame processing without skipping based on frame steps.

… frame step

coderabbitai · 2024-11-07T14:59:25Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update to the CVAT (Computer Vision Annotation Tool) addresses issues related to dataset exports, specifically ensuring that all frames are included when a non-default frame step is used. Key modifications include improvements in the export function for more efficient timestamp retrieval, enhanced error handling in the clear_export_cache function, and changes to the frame iteration logic in the iterate_frames method to allow for broader frame processing. A new custom exception class has also been introduced for better error specificity.

Changes

File Path	Change Summary
changelog.d/20241107_165701_sekachev.bs_fixed_export.md	Updated to document the fix for dataset export failures related to non-default frame steps.
cvat/apps/dataset_manager/views.py	Modified `export` function for efficient timestamp retrieval using `values_list`. Updated error handling in `clear_export_cache` to use a custom `FileIsBeingUsedError` exception. Added the new exception class.
cvat/apps/engine/frame_provider.py	Changed `iterate_frames` method to initialize `frame_range` with `itertools.count(start_frame)` instead of using the frame step, allowing iteration over every integer from `start_frame`. No changes to error handling or public entities.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ExportFunction
    participant CacheManager
    participant FrameProvider

    User->>ExportFunction: Request dataset export
    ExportFunction->>FrameProvider: Retrieve frames
    FrameProvider->>FrameProvider: Iterate frames without skipping
    FrameProvider-->>ExportFunction: Return frames
    ExportFunction->>CacheManager: Manage export cache
    CacheManager-->>ExportFunction: Handle cache errors
    ExportFunction-->>User: Return exported dataset

🐰 In the fields where frames do play,
The rabbit hops and jumps all day.
With every step, a frame is found,
In CVAT's world, joy does abound!
Exporting now, with frames galore,
Hooray for changes, let’s explore! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

cvat/apps/dataset_manager/views.py (2)
Line range hint 213-214: Enhance the FileIsBeingUsedError exception.

While introducing a custom exception is good practice, consider making it more informative by:

Adding a docstring explaining when this exception is raised

Including file details in the error message

Consider this enhancement:
 class FileIsBeingUsedError(Exception):
-    pass
+    """Raised when attempting to clear an export cache file that is still in use."""
+    def __init__(self, file_path: str, ttl: timedelta):
+        self.file_path = file_path
+        self.ttl = ttl
+        super().__init__(
+            f"Export cache file '{file_path}' is still in use. "
+            f"Will be available for cleanup after {ttl}"
+        )
Line range hint 31-33: Consider addressing the TODO comment regarding RQ scheduler.

The comment indicates a potential race condition where two jobs with the same name could run in parallel during retries. Consider:

Implementing proper retries using RQ's built-in scheduler

Adding a unique suffix to retried job IDs to prevent parallel execution

Would you like me to help design a more robust retry mechanism that prevents this race condition?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 58775bf and 50de904.

📒 Files selected for processing (3)

changelog.d/20241107_165701_sekachev.bs_fixed_export.md (1 hunks)
cvat/apps/dataset_manager/views.py (1 hunks)
cvat/apps/engine/frame_provider.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

changelog.d/20241107_165701_sekachev.bs_fixed_export.md

🔇 Additional comments (2)

cvat/apps/dataset_manager/views.py (1)

125-126: Efficient database query optimization.

The change to use values_list('updated_date', flat=True) instead of fetching full task instances is a good optimization that reduces memory usage and database load.

Let's verify if this change fully addresses the frame step issue mentioned in the PR objectives:

cvat/apps/engine/frame_provider.py (1)

372-372: LGTM! The change fixes frame skipping during iteration.

The modification ensures that all frames are processed during iteration, addressing the issue of missing frames in exports with non-default frame steps.

Let's verify that this change doesn't affect other functionality that might depend on frame steps:

✅ Verification successful

Frame step is still properly handled in other code paths

The verification shows that frame steps are correctly maintained throughout the codebase:

Frame step is used for absolute/relative frame number conversions
Frame ranges in segments and jobs respect the step
Frame step is properly applied in data import/export paths
The change in TaskFrameProvider.iterate_frames() only affects the iteration order, while the frame mapping logic remains intact

The removal of frame step from itertools.count() is safe as it only changes how frames are iterated internally within the provider, while all frame number translations still respect the configured step.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other references to frame steps to ensure they're not affected
rg "get_frame_step" -A 5

Length of output: 16932

cvat/apps/dataset_manager/views.py

changelog.d/20241107_165701_sekachev.bs_fixed_export.md

Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com>

zhiltsov-max · 2024-11-07T15:08:54Z

cvat/apps/engine/frame_provider.py

@@ -369,7 +369,7 @@ def iterate_frames(
        quality: FrameQuality = FrameQuality.ORIGINAL,
        out_type: FrameOutputType = FrameOutputType.BUFFER,
    ) -> Iterator[DataWithMeta[AnyFrame]]:
-        frame_range = itertools.count(start_frame, self._db_task.data.get_frame_step())
+        frame_range = itertools.count(start_frame)


Probably, db_segment_frame_set below has to be converted to relative ids. There is the get_rel_frame_number() method for this

Converted idx to absolute instead as it done in another iterate_frames

Ok, please call dev/format_python_code.sh

Why not to add it to git pre-commit hook?

There is a problem with determining the right interpreter in a cross-platform manner.

zhiltsov-max · 2024-11-07T15:27:22Z

It would be nice to add a test for this to avoid regressions in future.

bsekachev · 2024-11-07T17:19:55Z

Yes, it would be. Hovewer I do not have enough time. Let's keep the card on the agile board.

sonarcloud · 2024-11-07T17:22:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov-commenter · 2024-11-07T21:16:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.25%. Comparing base (58775bf) to head (df81f93).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8662   +/-   ##
========================================
  Coverage    74.24%   74.25%           
========================================
  Files          401      401           
  Lines        43465    43465           
  Branches      3950     3950           
========================================
+ Hits         32270    32273    +3     
+ Misses       11195    11192    -3

Components	Coverage Δ
cvat-ui	`78.53% <ø> (+0.01%)`	⬆️
cvat-server	`70.58% <100.00%> (ø)`

Export dataset in CVAT format misses frames in tasks with non-default…

50de904

… frame step

bsekachev requested review from zhiltsov-max, Marishka17 and nmanovic as code owners November 7, 2024 14:59

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

zhiltsov-max reviewed Nov 7, 2024

View reviewed changes

cvat/apps/dataset_manager/views.py Outdated Show resolved Hide resolved

zhiltsov-max reviewed Nov 7, 2024

View reviewed changes

changelog.d/20241107_165701_sekachev.bs_fixed_export.md Outdated Show resolved Hide resolved

bsekachev and others added 2 commits November 7, 2024 17:08

Removed extra lambda

6be1c0b

Update changelog.d/20241107_165701_sekachev.bs_fixed_export.md

a4c1b01

Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com>

zhiltsov-max reviewed Nov 7, 2024

View reviewed changes

bsekachev added 2 commits November 7, 2024 17:17

Added conversion

c9739bc

Aborted changes

76cacd7

Linted code

df81f93

bsekachev merged commit a6fd1e5 into develop Nov 8, 2024
34 checks passed

bsekachev deleted the bs/fixed_export branch November 8, 2024 13:40

cvat-bot bot mentioned this pull request Nov 11, 2024

Release v2.22.0 #8678

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export dataset in CVAT format misses frames in tasks with non-default… #8662

Export dataset in CVAT format misses frames in tasks with non-default… #8662

bsekachev commented Nov 7, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

zhiltsov-max Nov 7, 2024 •

edited

Loading

bsekachev Nov 7, 2024

zhiltsov-max Nov 7, 2024

bsekachev Nov 7, 2024

zhiltsov-max Nov 7, 2024

zhiltsov-max commented Nov 7, 2024

bsekachev commented Nov 7, 2024

sonarcloud bot commented Nov 7, 2024

codecov-commenter commented Nov 7, 2024

Export dataset in CVAT format misses frames in tasks with non-default… #8662

Export dataset in CVAT format misses frames in tasks with non-default… #8662

Conversation

bsekachev commented Nov 7, 2024 • edited by coderabbitai bot Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

coderabbitai bot commented Nov 7, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

zhiltsov-max Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

bsekachev Nov 7, 2024

Choose a reason for hiding this comment

zhiltsov-max Nov 7, 2024

Choose a reason for hiding this comment

bsekachev Nov 7, 2024

Choose a reason for hiding this comment

zhiltsov-max Nov 7, 2024

Choose a reason for hiding this comment

zhiltsov-max commented Nov 7, 2024

bsekachev commented Nov 7, 2024

sonarcloud bot commented Nov 7, 2024

Quality Gate passed

codecov-commenter commented Nov 7, 2024

Codecov Report

bsekachev commented Nov 7, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

zhiltsov-max Nov 7, 2024 •

edited

Loading