Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coco export subsets to different folders #8171

Merged
merged 5 commits into from
Jul 17, 2024
Merged

Conversation

Eldies
Copy link
Contributor

@Eldies Eldies commented Jul 16, 2024

Motivation and context

Fixes #4993

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • New Features

    • Enhanced COCO export functionality to organize images into separate subfolders by subsets.
  • Tests

    • Introduced a new test to verify the creation of subfolders in COCO exports.
  • Data Updates

    • Updated various test data records including annotations, user logins, sessions, images, projects, tasks, jobs, and quality settings to reflect new entries and configurations.

Copy link
Contributor

coderabbitai bot commented Jul 16, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes focus on modifying the export logic for COCO format annotations in CVAT. The primary enhancement is organizing images from different subsets (e.g., train, test, validation) into separate subfolders. This prevents filename conflicts within a single folder. Additionally, test cases are updated to validate the new subfolder structure.

Changes

Files / Modules Change Summary
cvat/apps/dataset_manager/formats/coco.py Modified the merge_images parameter from True to False in export methods to ensure images are organized into subfolders for different subsets.
tests/python/rest_api/test_projects.py Introduced imports (itertools, itemgetter) and added a new test method test_creates_subfolders_in_coco_export to check the creation of subfolders in a COCO export.
tests/python/shared/assets/annotations.json Added new entries with empty shapes, tags, tracks, and version fields to support new test cases.
tests/python/shared/assets/cvat_db/data.json Updated timestamps for user accounts, sessions, and added entries for engine data, images, projects, tasks, segments, jobs, storage configurations, quality settings, and webhooks.
tests/python/shared/assets/jobs.json Added two new job entries with specific details to support new test cases.
tests/python/shared/assets/projects.json Increased the count of results and added a new project entry with detailed information.
tests/python/shared/assets/quality_settings.json Increased count and added new quality settings entries for tasks.
tests/python/shared/assets/tasks.json Added new task objects with attributes and metadata to support new functionalities.
tests/python/shared/assets/users.json Updated last_login timestamps for existing users.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CVAT Server
    participant COCO Exporter
    participant File System
    
    Client ->> CVAT Server: Request to export task dataset to COCO
    CVAT Server ->> COCO Exporter: Initiate COCO export with merge_images=False
    COCO Exporter ->> File System: Create separate subfolders for train, test, validation
    COCO Exporter ->> CVAT Server: Provide export results
    CVAT Server ->> Client: Deliver exported dataset with organized subfolders
Loading

Assessment against linked issues

Objective Addressed Explanation
Create subfolders for train, test, and validation subsets in COCO export (#4993)

Poem

In folders neat, the images lay,
For train, test, and validate, they stay.
No more will frames collide and fight,
Each subset cozy in its own right.
CVAT's export now shines bright! 🐇📂✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Eldies Eldies marked this pull request as ready for review July 16, 2024 06:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (2)
tests/python/shared/assets/cvat_db/data.json (2)

1522-1559: Potential Redundancy in Data Entries

Some entries, particularly in engine.data and engine.image, appear to be redundant with identical fields. Consider if this redundancy is necessary or if it could be optimized.

Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604


11554-11601: Review Storage Configuration

The storage configuration entries are all set to "local" with "cloud_storage": null. Ensure that this aligns with the intended storage architecture and that there are fallbacks or error handling for storage issues.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b5d48c7 and 4793144.

Files ignored due to path filters (1)
  • tests/python/shared/assets/cvat_db/cvat_data.tar.bz2 is excluded by !**/*.bz2
Files selected for processing (10)
  • changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md (1 hunks)
  • cvat/apps/dataset_manager/formats/coco.py (2 hunks)
  • tests/python/rest_api/test_projects.py (2 hunks)
  • tests/python/shared/assets/annotations.json (2 hunks)
  • tests/python/shared/assets/cvat_db/data.json (12 hunks)
  • tests/python/shared/assets/jobs.json (1 hunks)
  • tests/python/shared/assets/projects.json (1 hunks)
  • tests/python/shared/assets/quality_settings.json (2 hunks)
  • tests/python/shared/assets/tasks.json (1 hunks)
  • tests/python/shared/assets/users.json (2 hunks)
Files skipped from review due to trivial changes (4)
  • changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md
  • tests/python/shared/assets/annotations.json
  • tests/python/shared/assets/quality_settings.json
  • tests/python/shared/assets/users.json
Additional comments not posted (13)
cvat/apps/dataset_manager/formats/coco.py (2)

23-23: Change in merge_images parameter from True to False is appropriate for the given context.

This change aligns with the PR's objective to prevent file overwriting by exporting different subsets into separate subfolders. It's a straightforward and effective solution to the problem described.


47-47: Proper adjustment in merge_images parameter for COCO Keypoints export.

Changing merge_images from True to False ensures that keypoints data for different subsets are stored in separate subfolders, preventing data loss due to overwriting.

tests/python/shared/assets/projects.json (1)

2-47: New project entry added correctly.

The new project entry is well-structured and includes all necessary fields such as name, owner, and task_subsets which are crucial for managing different subsets. The inclusion of subsets like "Train" and "Validation" in task_subsets is particularly relevant to the PR's objectives.

tests/python/shared/assets/tasks.json (1)

2-107: New task entries added correctly.

The new task entries are well-structured and include all necessary fields such as name, owner, labels, jobs, and subset. The inclusion of subsets like "Validation" and "Train" in the subset field for the respective tasks aligns well with the PR's objectives and ensures that tasks are correctly categorized.

tests/python/shared/assets/jobs.json (2)

2-2: Updated job count reflects new entries.

The count has been updated from 25 to 27, indicating the addition of two new job entries. This change is consistent with the PR's objective to handle new export functionalities.


6-45: New job entries added.

Two new job entries have been added with IDs 34 and 33. These entries include details such as created_date, data_chunk_size, and project_id, among others. Each field appears to be correctly formatted and consistent with the existing data structure.

  • ID 34: Job for project ID 14, indicating a specific task related to the project. The source_storage and target_storage are both local, which is typical for jobs not requiring cloud storage.
  • ID 33: Similar setup to ID 34, also for project ID 14, suggesting these jobs are part of the same project or batch operation.

Both entries are marked with the annotation mode and new state, which aligns with their recent creation date.

Also applies to: 47-86

tests/python/rest_api/test_projects.py (3)

7-7: New import added: itertools

This import is necessary for the new functionality to group tasks by project ID using itertools.groupby in the new test method test_creates_subfolders_in_coco_export.


15-15: New import added: from operator import itemgetter

This import is used in the new test method test_creates_subfolders_in_coco_export to extract the project_id from tasks. It facilitates sorting and grouping operations.


819-837: New test method added: test_creates_subfolders_in_coco_export

This method tests the new functionality of creating subfolders for different subsets (Train, Validation) when exporting COCO datasets. The method uses itertools.groupby and itemgetter to group and filter tasks by project ID, ensuring that the subsets are correctly sorted and that the export process creates the expected subfolders.

The use of zipfile.ZipFile to inspect the contents of the exported zip file is appropriate to verify that the subfolders are indeed created and contain files. The assertion checks if files starting with images/{subset}/ exist, which directly relates to the PR's objective to prevent file overwriting by using subfolders.

tests/python/shared/assets/cvat_db/data.json (4)

524-525: Review Session Expiration Dates

The session expiration date "expire_date": "2024-07-29T15:28:06.762Z" seems to be set correctly. Ensure that this aligns with the application's session management policies.


520-527: Data Integrity Check Required

The added fields and models need to be verified for consistency and integrity. Ensure that all primary keys (pk) and relationships (task, data, etc.) are correctly mapped and that there are no orphan records.

Also applies to: 1522-1559, 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604


1522-1559: Consistency Check for New Entries

The new entries across various models (engine.data, engine.image, engine.project, engine.task, engine.clientfile, engine.segment, engine.job, engine.storage, quality_control.qualitysettings) need to be checked for consistency with existing data structures and naming conventions.

Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604


16568-16604: Quality Settings Update

The new quality settings entries need to be reviewed to ensure they align with the application's performance and quality requirements. Verify that the thresholds and parameters set are appropriate for the tasks they are associated with.

Verification successful

To verify the appropriateness of the new quality settings entries, we need to gather additional context about the application's performance and quality requirements. This involves searching for any documentation or configuration files that might provide this context.


To verify the appropriateness of the new quality settings entries, I reviewed the relevant sections in the documentation and compared them with the historical data in quality_settings.json. The new entries are consistent with the historical values and align with the expected parameters outlined in the documentation.

Quality Settings Update Verified

The new quality settings entries align with the application's performance and quality requirements.

  • site/content/en/docs/manual/basics/CVAT-annotation-Interface/workspace.md
  • site/content/en/docs/manual/advanced/analytics-and-monitoring/auto-qa.md
  • tests/python/shared/assets/quality_settings.json
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the appropriateness of new quality settings.

# Test: Check the quality settings against task requirements and historical performance data.
rg --type json -A 5 $'quality_control.qualitysettings'

Length of output: 8235


Script:

#!/bin/bash
# Description: Search for documentation or configuration files related to quality settings.

# Find all markdown files which might contain documentation
fd --type f --extension md

# Find all JSON and YAML files which might contain configuration settings
fd --type f --extension json --extension yaml

Length of output: 13421


Script:

#!/bin/bash
# Description: Search for references to quality settings in documentation and configuration files.

# Search for references to quality settings in README.md files
rg --type md 'quality settings'

# Search for references to quality settings in the documentation within the site/content/en/docs directory
rg --type md --glob 'site/content/en/docs/**' 'quality settings'

# Display the contents of the tests/python/shared/assets/quality_settings.json file
cat tests/python/shared/assets/quality_settings.json

Length of output: 10643

tests/python/shared/assets/cvat_db/data.json Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (develop@b5d48c7). Learn more about missing BASE report.
Report is 6 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8171   +/-   ##
==========================================
  Coverage           ?   83.41%           
==========================================
  Files              ?      388           
  Lines              ?    41305           
  Branches           ?     3854           
==========================================
  Hits               ?    34453           
  Misses             ?     6852           
  Partials           ?        0           
Components Coverage Δ
cvat-ui 79.75% <ø> (?)
cvat-server 86.69% <ø> (?)

Copy link

sonarcloud bot commented Jul 16, 2024

@zhiltsov-max zhiltsov-max merged commit 49c39ef into develop Jul 17, 2024
33 checks passed
@cvat-bot cvat-bot bot mentioned this pull request Jul 18, 2024
@bsekachev bsekachev deleted the dl/coco-export-folders branch August 13, 2024 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export COCO style annotation subsets from tasks with videos
3 participants