Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine Export or "Dump" mulitple Jobs into one .zip with train/test/val splits #791

Closed
Sparrowtech opened this issue Oct 22, 2019 · 11 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Sparrowtech
Copy link

WORKFLOW WORKAROUNDS: We've created individual "Jobs" to represent different classes of objects; i.e. "car, truck, van, helicopter, airplane, etc." largely due to CVAT difficulties-ability to load very large datasets. Each CVAT Job represents ~2500 images and tends to be collectively around 1GB in size between the images and annotations. Currently there are ~ 60 different jobs or classes of objects, 60 GB and ~ 150,000 images.

Routinely we create specific datasets (10-20 object classes or Jobs") which require a lot of post-exporting heavy lifting having to merge tfrecords or xml files into one or batches, not to mention splitting of train/test/val sets. I know that there are a lot of tools out there to help with pre-process and we currently employ many.

Would be ideal to have functionality to choose " Car, Airplane, Helicopter, Bus, ... etc" from the dashboard to EXPORT INTO ONE TASK... AND ability to choose ratio of images to be split into train/test/val sets. e.g. 70% train, 15% test, 15% val. resulting in .zip file(s) with images-annotations or tfrecords created. No extra processing for randomizing, just extract split % from each job and combined for e.g. "Train" insuring well balanced classes rather than relying on function later unknown which is just a random exercise.

Thanks!

@nmanovic
Copy link
Contributor

@Sparrowtech , we are going to introduce projects where you can join all similar tasks and after that export images + annotations for the whole project. Will it work for you?

image

@nmanovic nmanovic added the enhancement New feature or request label Oct 23, 2019
@nmanovic nmanovic added this to the 1.0.0 - Release milestone Oct 23, 2019
@Sparrowtech
Copy link
Author

That would be great! Look forward to the update and also would be great to have ability to export the images/annotations not only to one file but also option to export as Train, Test, & Validation sets. Thanks!

@Sparrowtech
Copy link
Author

@zhiltsov-max, I see that there has been some activity and a "Release" that has been made on this request. Not super familiar with how the Releases are made available from GitHub whether production or for Beta. Really simply question... Is this something I can have access to today or is it being embedded into another release down the road? Please advise if you don't mind.
Thanks!

@zhiltsov-max
Copy link
Contributor

@nmanovic, please, answer here.

@nmanovic
Copy link
Contributor

@Sparrowtech , the feature will be available in Release 1.0.0 (~ end of February next year). During a week or two first prototype will be merged into develop branch. If you can test the implementation and confirm that it is something useful for you. We don't recommend to use develop in production but internally we use it for our own tasks.

Does it answer on your question?

@Sparrowtech
Copy link
Author

yes, thank you and will look for feature in the development branch over the next few weeks.

@nmanovic
Copy link
Contributor

Let's keep the issue till it is resolved.

@nmanovic nmanovic reopened this Nov 18, 2019
@nmanovic nmanovic mentioned this issue Mar 5, 2020
@nmanovic nmanovic modified the milestones: 1.0.0-release, 1.1.0-beta May 23, 2020
@zhiltsov-max
Copy link
Contributor

Currently, it is possible with Datumaro:

  1. Export all desired tasks in Datumaro format, unpack
  2. Check readme in the downloaded files
datum project create -o proj
datum source add path -p proj -f datumaro_project <path_to_the_unpacked_archive1>
datum source add path -p proj -f datumaro_project <path_to_the_unpacked_archive2>
...
datum project transform -p proj -t random_split [-- -s subset1:ratio1 etc.]
datum project export -f tf_detection_api -p <path_to_transform_result> -- --save-images

@nmanovic nmanovic modified the milestones: 1.1.0-beta, 1.1.0-release Aug 2, 2020
@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Aug 13, 2020

Keeping open as a request for:

@shaojun
Copy link
Contributor

shaojun commented Apr 3, 2021

when can we expect to have the project export feature in UI? I see this similar request for years, thanks.

@zhiltsov-max
Copy link
Contributor

Done in #3365

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants