-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coco export subsets to different folders #8171
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes focus on modifying the export logic for COCO format annotations in CVAT. The primary enhancement is organizing images from different subsets (e.g., train, test, validation) into separate subfolders. This prevents filename conflicts within a single folder. Additionally, test cases are updated to validate the new subfolder structure. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant CVAT Server
participant COCO Exporter
participant File System
Client ->> CVAT Server: Request to export task dataset to COCO
CVAT Server ->> COCO Exporter: Initiate COCO export with merge_images=False
COCO Exporter ->> File System: Create separate subfolders for train, test, validation
COCO Exporter ->> CVAT Server: Provide export results
CVAT Server ->> Client: Deliver exported dataset with organized subfolders
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
639f768
to
4793144
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (2)
tests/python/shared/assets/cvat_db/data.json (2)
1522-1559
: Potential Redundancy in Data EntriesSome entries, particularly in
engine.data
andengine.image
, appear to be redundant with identical fields. Consider if this redundancy is necessary or if it could be optimized.Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604
11554-11601
: Review Storage ConfigurationThe storage configuration entries are all set to
"local"
with"cloud_storage": null
. Ensure that this aligns with the intended storage architecture and that there are fallbacks or error handling for storage issues.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
tests/python/shared/assets/cvat_db/cvat_data.tar.bz2
is excluded by!**/*.bz2
Files selected for processing (10)
- changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md (1 hunks)
- cvat/apps/dataset_manager/formats/coco.py (2 hunks)
- tests/python/rest_api/test_projects.py (2 hunks)
- tests/python/shared/assets/annotations.json (2 hunks)
- tests/python/shared/assets/cvat_db/data.json (12 hunks)
- tests/python/shared/assets/jobs.json (1 hunks)
- tests/python/shared/assets/projects.json (1 hunks)
- tests/python/shared/assets/quality_settings.json (2 hunks)
- tests/python/shared/assets/tasks.json (1 hunks)
- tests/python/shared/assets/users.json (2 hunks)
Files skipped from review due to trivial changes (4)
- changelog.d/20240716_095853_dlavrukhin_coco_export_folders.md
- tests/python/shared/assets/annotations.json
- tests/python/shared/assets/quality_settings.json
- tests/python/shared/assets/users.json
Additional comments not posted (13)
cvat/apps/dataset_manager/formats/coco.py (2)
23-23
: Change inmerge_images
parameter fromTrue
toFalse
is appropriate for the given context.This change aligns with the PR's objective to prevent file overwriting by exporting different subsets into separate subfolders. It's a straightforward and effective solution to the problem described.
47-47
: Proper adjustment inmerge_images
parameter for COCO Keypoints export.Changing
merge_images
fromTrue
toFalse
ensures that keypoints data for different subsets are stored in separate subfolders, preventing data loss due to overwriting.tests/python/shared/assets/projects.json (1)
2-47
: New project entry added correctly.The new project entry is well-structured and includes all necessary fields such as
name
,owner
, andtask_subsets
which are crucial for managing different subsets. The inclusion of subsets like "Train" and "Validation" intask_subsets
is particularly relevant to the PR's objectives.tests/python/shared/assets/tasks.json (1)
2-107
: New task entries added correctly.The new task entries are well-structured and include all necessary fields such as
name
,owner
,labels
,jobs
, andsubset
. The inclusion of subsets like "Validation" and "Train" in thesubset
field for the respective tasks aligns well with the PR's objectives and ensures that tasks are correctly categorized.tests/python/shared/assets/jobs.json (2)
2-2
: Updated job count reflects new entries.The
count
has been updated from 25 to 27, indicating the addition of two new job entries. This change is consistent with the PR's objective to handle new export functionalities.
6-45
: New job entries added.Two new job entries have been added with IDs 34 and 33. These entries include details such as
created_date
,data_chunk_size
, andproject_id
, among others. Each field appears to be correctly formatted and consistent with the existing data structure.
- ID 34: Job for project ID 14, indicating a specific task related to the project. The
source_storage
andtarget_storage
are both local, which is typical for jobs not requiring cloud storage.- ID 33: Similar setup to ID 34, also for project ID 14, suggesting these jobs are part of the same project or batch operation.
Both entries are marked with the
annotation
mode andnew
state, which aligns with their recent creation date.Also applies to: 47-86
tests/python/rest_api/test_projects.py (3)
7-7
: New import added:itertools
This import is necessary for the new functionality to group tasks by project ID using
itertools.groupby
in the new test methodtest_creates_subfolders_in_coco_export
.
15-15
: New import added:from operator import itemgetter
This import is used in the new test method
test_creates_subfolders_in_coco_export
to extract theproject_id
from tasks. It facilitates sorting and grouping operations.
819-837
: New test method added:test_creates_subfolders_in_coco_export
This method tests the new functionality of creating subfolders for different subsets (Train, Validation) when exporting COCO datasets. The method uses
itertools.groupby
anditemgetter
to group and filter tasks by project ID, ensuring that the subsets are correctly sorted and that the export process creates the expected subfolders.The use of
zipfile.ZipFile
to inspect the contents of the exported zip file is appropriate to verify that the subfolders are indeed created and contain files. The assertion checks if files starting withimages/{subset}/
exist, which directly relates to the PR's objective to prevent file overwriting by using subfolders.tests/python/shared/assets/cvat_db/data.json (4)
524-525
: Review Session Expiration DatesThe session expiration date
"expire_date": "2024-07-29T15:28:06.762Z"
seems to be set correctly. Ensure that this aligns with the application's session management policies.
520-527
: Data Integrity Check RequiredThe added fields and models need to be verified for consistency and integrity. Ensure that all primary keys (
pk
) and relationships (task
,data
, etc.) are correctly mapped and that there are no orphan records.Also applies to: 1522-1559, 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604
1522-1559
: Consistency Check for New EntriesThe new entries across various models (
engine.data
,engine.image
,engine.project
,engine.task
,engine.clientfile
,engine.segment
,engine.job
,engine.storage
,quality_control.qualitysettings
) need to be checked for consistency with existing data structures and naming conventions.Also applies to: 3076-3097, 3338-3356, 3841-3892, 4781-4796, 5081-5102, 5488-5516, 11554-11601, 16568-16604
16568-16604
: Quality Settings UpdateThe new quality settings entries need to be reviewed to ensure they align with the application's performance and quality requirements. Verify that the thresholds and parameters set are appropriate for the tasks they are associated with.
Verification successful
To verify the appropriateness of the new quality settings entries, we need to gather additional context about the application's performance and quality requirements. This involves searching for any documentation or configuration files that might provide this context.
To verify the appropriateness of the new quality settings entries, I reviewed the relevant sections in the documentation and compared them with the historical data in
quality_settings.json
. The new entries are consistent with the historical values and align with the expected parameters outlined in the documentation.Quality Settings Update Verified
The new quality settings entries align with the application's performance and quality requirements.
site/content/en/docs/manual/basics/CVAT-annotation-Interface/workspace.md
site/content/en/docs/manual/advanced/analytics-and-monitoring/auto-qa.md
tests/python/shared/assets/quality_settings.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the appropriateness of new quality settings. # Test: Check the quality settings against task requirements and historical performance data. rg --type json -A 5 $'quality_control.qualitysettings'Length of output: 8235
Script:
#!/bin/bash # Description: Search for documentation or configuration files related to quality settings. # Find all markdown files which might contain documentation fd --type f --extension md # Find all JSON and YAML files which might contain configuration settings fd --type f --extension json --extension yamlLength of output: 13421
Script:
#!/bin/bash # Description: Search for references to quality settings in documentation and configuration files. # Search for references to quality settings in README.md files rg --type md 'quality settings' # Search for references to quality settings in the documentation within the site/content/en/docs directory rg --type md --glob 'site/content/en/docs/**' 'quality settings' # Display the contents of the tests/python/shared/assets/quality_settings.json file cat tests/python/shared/assets/quality_settings.jsonLength of output: 10643
4793144
to
b8f86af
Compare
04fde9a
to
b89592a
Compare
b89592a
to
6179abe
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8171 +/- ##
==========================================
Coverage ? 83.41%
==========================================
Files ? 388
Lines ? 41305
Branches ? 3854
==========================================
Hits ? 34453
Misses ? 6852
Partials ? 0
|
b737547
to
2c7c1ff
Compare
Quality Gate passedIssues Measures |
Motivation and context
Fixes #4993
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Tests
Data Updates