coco export subsets to different folders #8171

Eldies · 2024-07-16T05:58:36Z

Motivation and context

Fixes #4993

How has this been tested?

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Enhanced COCO export functionality to organize images into separate subfolders by subsets.
Tests
- Introduced a new test to verify the creation of subfolders in COCO exports.
Data Updates
- Updated various test data records including annotations, user logins, sessions, images, projects, tasks, jobs, and quality settings to reflect new entries and configurations.

coderabbitai · 2024-07-16T05:58:40Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes focus on modifying the export logic for COCO format annotations in CVAT. The primary enhancement is organizing images from different subsets (e.g., train, test, validation) into separate subfolders. This prevents filename conflicts within a single folder. Additionally, test cases are updated to validate the new subfolder structure.

Changes

Files / Modules	Change Summary
`cvat/apps/dataset_manager/formats/coco.py`	Modified the `merge_images` parameter from `True` to `False` in export methods to ensure images are organized into subfolders for different subsets.
`tests/python/rest_api/test_projects.py`	Introduced imports (`itertools`, `itemgetter`) and added a new test method `test_creates_subfolders_in_coco_export` to check the creation of subfolders in a COCO export.
`tests/python/shared/assets/annotations.json`	Added new entries with empty shapes, tags, tracks, and version fields to support new test cases.
`tests/python/shared/assets/cvat_db/data.json`	Updated timestamps for user accounts, sessions, and added entries for engine data, images, projects, tasks, segments, jobs, storage configurations, quality settings, and webhooks.
`tests/python/shared/assets/jobs.json`	Added two new job entries with specific details to support new test cases.
`tests/python/shared/assets/projects.json`	Increased the count of results and added a new project entry with detailed information.
`tests/python/shared/assets/quality_settings.json`	Increased count and added new quality settings entries for tasks.
`tests/python/shared/assets/tasks.json`	Added new task objects with attributes and metadata to support new functionalities.
`tests/python/shared/assets/users.json`	Updated `last_login` timestamps for existing users.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CVAT Server
    participant COCO Exporter
    participant File System
    
    Client ->> CVAT Server: Request to export task dataset to COCO
    CVAT Server ->> COCO Exporter: Initiate COCO export with merge_images=False
    COCO Exporter ->> File System: Create separate subfolders for train, test, validation
    COCO Exporter ->> CVAT Server: Provide export results
    CVAT Server ->> Client: Deliver exported dataset with organized subfolders

Assessment against linked issues

Objective	Addressed	Explanation
Create subfolders for train, test, and validation subsets in COCO export (#4993)	✅

Poem

In folders neat, the images lay,
For train, test, and validate, they stay.
No more will frames collide and fight,
Each subset cozy in its own right.
CVAT's export now shines bright! 🐇📂✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (2)

tests/python/shared/assets/cvat_db/data.json (2)

1522-1559: Potential Redundancy in Data Entries

Some entries, particularly in engine.data and engine.image, appear to be redundant with identical fields. Consider if this redundancy is necessary or if it could be optimized.

Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604

11554-11601: Review Storage Configuration

The storage configuration entries are all set to "local" with "cloud_storage": null. Ensure that this aligns with the intended storage architecture and that there are fallbacks or error handling for storage issues.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b5d48c7 and 4793144.

Files ignored due to path filters (1)

tests/python/shared/assets/cvat_db/cvat_data.tar.bz2 is excluded by !**/*.bz2

Files selected for processing (10)

changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md (1 hunks)
cvat/apps/dataset_manager/formats/coco.py (2 hunks)
tests/python/rest_api/test_projects.py (2 hunks)
tests/python/shared/assets/annotations.json (2 hunks)
tests/python/shared/assets/cvat_db/data.json (12 hunks)
tests/python/shared/assets/jobs.json (1 hunks)
tests/python/shared/assets/projects.json (1 hunks)
tests/python/shared/assets/quality_settings.json (2 hunks)
tests/python/shared/assets/tasks.json (1 hunks)
tests/python/shared/assets/users.json (2 hunks)

Files skipped from review due to trivial changes (4)

changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md
tests/python/shared/assets/annotations.json
tests/python/shared/assets/quality_settings.json
tests/python/shared/assets/users.json

Additional comments not posted (13)

cvat/apps/dataset_manager/formats/coco.py (2)

23-23: Change in merge_images parameter from True to False is appropriate for the given context.

This change aligns with the PR's objective to prevent file overwriting by exporting different subsets into separate subfolders. It's a straightforward and effective solution to the problem described.

47-47: Proper adjustment in merge_images parameter for COCO Keypoints export.

Changing merge_images from True to False ensures that keypoints data for different subsets are stored in separate subfolders, preventing data loss due to overwriting.

tests/python/shared/assets/projects.json (1)

2-47: New project entry added correctly.

The new project entry is well-structured and includes all necessary fields such as name, owner, and task_subsets which are crucial for managing different subsets. The inclusion of subsets like "Train" and "Validation" in task_subsets is particularly relevant to the PR's objectives.

tests/python/shared/assets/tasks.json (1)

2-107: New task entries added correctly.

The new task entries are well-structured and include all necessary fields such as name, owner, labels, jobs, and subset. The inclusion of subsets like "Validation" and "Train" in the subset field for the respective tasks aligns well with the PR's objectives and ensures that tasks are correctly categorized.

tests/python/shared/assets/jobs.json (2)

2-2: Updated job count reflects new entries.

The count has been updated from 25 to 27, indicating the addition of two new job entries. This change is consistent with the PR's objective to handle new export functionalities.

6-45: New job entries added.

Two new job entries have been added with IDs 34 and 33. These entries include details such as created_date, data_chunk_size, and project_id, among others. Each field appears to be correctly formatted and consistent with the existing data structure.

ID 34: Job for project ID 14, indicating a specific task related to the project. The source_storage and target_storage are both local, which is typical for jobs not requiring cloud storage.

ID 33: Similar setup to ID 34, also for project ID 14, suggesting these jobs are part of the same project or batch operation.

Both entries are marked with the annotation mode and new state, which aligns with their recent creation date.

Also applies to: 47-86

tests/python/rest_api/test_projects.py (3)

7-7: New import added: itertools

This import is necessary for the new functionality to group tasks by project ID using itertools.groupby in the new test method test_creates_subfolders_in_coco_export.

15-15: New import added: from operator import itemgetter

This import is used in the new test method test_creates_subfolders_in_coco_export to extract the project_id from tasks. It facilitates sorting and grouping operations.

819-837: New test method added: test_creates_subfolders_in_coco_export

This method tests the new functionality of creating subfolders for different subsets (Train, Validation) when exporting COCO datasets. The method uses itertools.groupby and itemgetter to group and filter tasks by project ID, ensuring that the subsets are correctly sorted and that the export process creates the expected subfolders.

The use of zipfile.ZipFile to inspect the contents of the exported zip file is appropriate to verify that the subfolders are indeed created and contain files. The assertion checks if files starting with images/{subset}/ exist, which directly relates to the PR's objective to prevent file overwriting by using subfolders.
tests/python/shared/assets/cvat_db/data.json (4)
524-525: Review Session Expiration Dates

The session expiration date "expire_date": "2024-07-29T15:28:06.762Z" seems to be set correctly. Ensure that this aligns with the application's session management policies.

520-527: Data Integrity Check Required

The added fields and models need to be verified for consistency and integrity. Ensure that all primary keys (pk) and relationships (task, data, etc.) are correctly mapped and that there are no orphan records.

Also applies to: 1522-1559, 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604

1522-1559: Consistency Check for New Entries

The new entries across various models (engine.data, engine.image, engine.project, engine.task, engine.clientfile, engine.segment, engine.job, engine.storage, quality_control.qualitysettings) need to be checked for consistency with existing data structures and naming conventions.

Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604

16568-16604: Quality Settings Update

The new quality settings entries need to be reviewed to ensure they align with the application's performance and quality requirements. Verify that the thresholds and parameters set are appropriate for the tasks they are associated with.

Verification successful

To verify the appropriateness of the new quality settings entries, we need to gather additional context about the application's performance and quality requirements. This involves searching for any documentation or configuration files that might provide this context.

To verify the appropriateness of the new quality settings entries, I reviewed the relevant sections in the documentation and compared them with the historical data in quality_settings.json. The new entries are consistent with the historical values and align with the expected parameters outlined in the documentation.

Quality Settings Update Verified

The new quality settings entries align with the application's performance and quality requirements.

site/content/en/docs/manual/basics/CVAT-annotation-Interface/workspace.md

site/content/en/docs/manual/advanced/analytics-and-monitoring/auto-qa.md

tests/python/shared/assets/quality_settings.json
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the appropriateness of new quality settings.

# Test: Check the quality settings against task requirements and historical performance data.
rg --type json -A 5 $'quality_control.qualitysettings'
Length of output: 8235

Script:
#!/bin/bash
# Description: Search for documentation or configuration files related to quality settings.

# Find all markdown files which might contain documentation
fd --type f --extension md

# Find all JSON and YAML files which might contain configuration settings
fd --type f --extension json --extension yaml
Length of output: 13421

Script:
#!/bin/bash
# Description: Search for references to quality settings in documentation and configuration files.

# Search for references to quality settings in README.md files
rg --type md 'quality settings'

# Search for references to quality settings in the documentation within the site/content/en/docs directory
rg --type md --glob 'site/content/en/docs/**' 'quality settings'

# Display the contents of the tests/python/shared/assets/quality_settings.json file
cat tests/python/shared/assets/quality_settings.json
Length of output: 10643

tests/python/shared/assets/cvat_db/data.json

codecov-commenter · 2024-07-16T13:43:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (develop@b5d48c7). Learn more about missing BASE report.
Report is 6 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8171   +/-   ##
==========================================
  Coverage           ?   83.41%           
==========================================
  Files              ?      388           
  Lines              ?    41305           
  Branches           ?     3854           
==========================================
  Hits               ?    34453           
  Misses             ?     6852           
  Partials           ?        0

Components	Coverage Δ
cvat-ui	`79.75% <ø> (?)`
cvat-server	`86.69% <ø> (?)`

sonarcloud · 2024-07-16T19:27:05Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Eldies force-pushed the dl/coco-export-folders branch from 639f768 to 4793144 Compare July 16, 2024 06:05

Eldies marked this pull request as ready for review July 16, 2024 06:09

Eldies requested review from azhavoro, zhiltsov-max and nmanovic as code owners July 16, 2024 06:09

coderabbitai bot reviewed Jul 16, 2024

View reviewed changes

tests/python/shared/assets/cvat_db/data.json Outdated Show resolved Hide resolved

Eldies force-pushed the dl/coco-export-folders branch from 4793144 to b8f86af Compare July 16, 2024 06:23

zhiltsov-max reviewed Jul 16, 2024

View reviewed changes

tests/python/shared/assets/cvat_db/data.json Outdated Show resolved Hide resolved

added test project with 2 subsets

9b7e823

Eldies force-pushed the dl/coco-export-folders branch from 04fde9a to b89592a Compare July 16, 2024 11:56

Eldies added 3 commits July 16, 2024 17:02

fixing tests

c48b7d1

create subfolders for subsets for coco

751a1fc

update changelog

6179abe

Eldies force-pushed the dl/coco-export-folders branch from b89592a to 6179abe Compare July 16, 2024 13:02

testing almost all formats whether they make folders for subsets

2c7c1ff

Eldies force-pushed the dl/coco-export-folders branch from b737547 to 2c7c1ff Compare July 16, 2024 19:25

zhiltsov-max merged commit 49c39ef into develop Jul 17, 2024
33 checks passed

This was referenced Jul 17, 2024

Issue with exporting segmentation masks in COCO format #7938

Closed

Include video source in exported dataset frame name #7933

Open

cvat-bot bot mentioned this pull request Jul 18, 2024

Release v2.16.1 #8190

Merged

Bananaspirit mentioned this pull request Jul 25, 2024

Correction of export to COCO format in CVAT version 2.4.7 #8220

Closed

2 tasks

bsekachev deleted the dl/coco-export-folders branch August 13, 2024 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coco export subsets to different folders #8171

coco export subsets to different folders #8171

Eldies commented Jul 16, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 16, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov-commenter commented Jul 16, 2024 •

edited

Loading

sonarcloud bot commented Jul 16, 2024

coco export subsets to different folders #8171

coco export subsets to different folders #8171

Conversation

Eldies commented Jul 16, 2024 • edited by coderabbitai bot Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

coderabbitai bot commented Jul 16, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 16, 2024 • edited Loading

Codecov Report

sonarcloud bot commented Jul 16, 2024

Quality Gate passed

Eldies commented Jul 16, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 16, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov-commenter commented Jul 16, 2024 •

edited

Loading