Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new images to an existing task #4364

Closed
RRighart opened this issue Feb 18, 2022 · 22 comments
Closed

Adding new images to an existing task #4364

RRighart opened this issue Feb 18, 2022 · 22 comments
Labels
need info Need more information to investigate the issue

Comments

@RRighart
Copy link

The issue of adding additional images to a task was raised before here.
Are there currently any updates on this issue ?
I agree that it would be very useful and time-saving if there is an option to add new images later, is this possible now? Unfortunately, I am not able to find this option.
Otherwise, is there a work around, for ex. is it possible to load a new dataset (i.e. including previous images and newly added images), and in parallel load the text files with bounding boxes for those images that were already annotated ?

@nmanovic
Copy link
Contributor

@RRighart , please use project feature. You can think that a project is a dataset. Every task in the project can be in a specific subset (e.g., train, test, val). Jobs help you to annotate a task in parallel by several annotators. If you want to add a couple of images to a task, just create a project and move both tasks into the project.

@holgerbrandl
Copy link

The suggestion to use a project instead does not necessarily serve as good workaround here. This is because an annotation schema is modeled per project to allow multiple tasks to share this schema. When modeling tasks as separate projects there is no function to use a common annotation schema.

So having the option to add new images to an existing dataset would be a useful enhancement.

@nmanovic
Copy link
Contributor

@holgerbrandl , you can create multiple projects. One project for each dataset. I'm not sure which scenario isn't covered. Could you please explain?

@holgerbrandl
Copy link

Sure, I can create multiple projects. But there is no mechanism to ensure that they share the same common annotation model.

Tasks on the contrary share the annotation model of the project.

@bsekachev
Copy link
Member

@holgerbrandl

Could you clarify what is the same common annotation model?

@bsekachev bsekachev added the need info Need more information to investigate the issue label Sep 6, 2022
@holgerbrandl
Copy link

Sure thing. We have a common set of annotation that we use to annotate images across different data-sets. These can be defined on project level in cvat and then are propagated into the indivdual tasks/jobs within the project.

Example (just to illustrate where those annotations are defined)
image

@bsekachev
Copy link
Member

Okay, I see. You are speaking about labels specification, right?

But there is no mechanism to ensure that they share the same common annotation model.

You can copy the specification from one project to another using Raw tab (just with Ctrl+C, Ctrl+V).
In this case specification are identical.

@holgerbrandl
Copy link

That's nice workaround for an intial setup. Still, it requires quite some effort to keep them in sync. That's why I'd think this ticket is valid/open.

@bsekachev
Copy link
Member

Personally I do not have ideas how it could be implemented. Maybe I do not truly understand the use-case and why you can't use one project with a certain labels specifications.

If the community is ready to provide a common solution, it would be great.
We are not planning to implement it for now.

@nmanovic
Copy link
Contributor

nmanovic commented Sep 7, 2022

@holgerbrandl , I agree with Boris. Need to understand why you think it is a workarond but not a solution for your problem. Basically in CVAT

  • A project is a dataset
  • A task is a portion of data (you can mark some images as deleted now)
  • A job is an annotation work (way to parallel tasks between multiple labelers)

@holgerbrandl
Copy link

It's all about keeping projects well organized. Assuming I have complex set of annotations defined for a project, and have separate 2 tasks (e.g. for different types of objects, let's say planes and cars), and I happen to find another important image of a car, we currently have to create a new task in the project for just this single image, leading to redundancy in the project layout.

That's why it would imho be a nice feature in CVAT if a user could simply add an image to an existing task.

@bsekachev
Copy link
Member

We do not provide ability to add images to existing tasks because of current concept. The use-case you described has a solution as a creating a new task (considering a project as a whole dataset). The task in this case includes all the project labels, so you do not need to setup them again.

@PercipioCorey
Copy link

PercipioCorey commented Feb 29, 2024

Was this ever given further consideration or implemented? I would also like the ability to add images to an existing task.

My use case is that data is submitted from multiple sources and my dataset is growing. It would be convenient to annotate all of one source's images within one task, and another source's images within another task. This is because there will be commonalities within a given source's images, and looking at prior annotated images from that source would be helpful in annotating new images.

Our project is still stealthy, so I can't go into great detail, but I will try to answer any clarifying questions.

@siddagra
Copy link

siddagra commented Mar 2, 2024

If there are repeated images in a project across multiple tasks, it seems it does not update them. Rather it simply repeats them, then throws out an error when u try to export. Also makes it harder to annotate and quality check if there are repeated images across a project throughout several tasks

@siddagra
Copy link

siddagra commented Mar 2, 2024

0.5

Was this ever given further consideration or implemented? I would also like the ability to add images to an existing task.

My use case is that data is submitted from multiple sources and my dataset is growing. It would be convenient to annotate all of one source's images within one task, and another source's images within another task. This is because there will be commonalities within a given source's images, and looking at prior annotated images from that source would be helpful in annotating new images.

Our project is still stealthy, so I can't go into great detail, but I will try to answer any clarifying questions.

I agree with this hard. Many use cases fall under this category

@nmanovic
Copy link
Contributor

nmanovic commented Mar 4, 2024

@siddagra , @PercipioCorey , please use projects. You can add new tasks into a project. It is how the use case is supported.

@Toolfolks
Copy link

Toolfolks commented Mar 13, 2024

HI. I am not understanding this ( as usual). My scenario. I have created a project and added a video. I have used rectangle tracks on various objects. I export as Yolo1 and have a python script that trains the data and produces the best.pt file. Now that I have tested this and the results are ok I need to add a few more videos and product images to the project then annotate them and output all these to Yolo 1( ie the original video + additions ). Sorry for being dumb but how do I add files to the project. Cheers

@pieris98
Copy link

@Toolfolks the proposed solution is to create a new project for your new videos/images, and copy-paste the Raw label JSON text from your previous project to the new project.
@nmanovic I'm trying to understand what are the issues of changing this use-case. Are there any implementation deadlocks or something that breaks if someone would add the ability of also adding images in-place to the database?

@siddagra
Copy link

siddagra commented Mar 28, 2024

@siddagra , @PercipioCorey , please use projects. You can add new tasks into a project. It is how the use case is supported.

@nmanovic

How are duplicates handled in such a case?

@ovunctuzel-bc
Copy link

+1 for adding images to an existing task. There are many use cases where datasets grow organically. For example, let's say I use a GCP bucket and I have devices that automatically add images to the bucket. There shouldn't be any friction in CVAT to automatically expand a task to annotate latest images.

@nmanovic
Copy link
Contributor

Hi all, let's discuss and see if we can improve CVAT in the case. But our current model is the following:
Project = Dataset
Task = a part of the project
Job = actual work to annotate a task (a task can be split on multiple jobs and annotated in parallel)

The proposal pipeline:

  1. Create an empty project (dataset)
  2. Create as many tasks as you need. With each task you can add new images. Each task can be assigned to a specific subset (train, test, val, etc)
  3. Annotate tasks in parallel using jobs.

There are multiple issues with extending a task. Let's say you annotate something and mark it as completed. Now you have new data. In which task you want to add new images automatically or manually? Why do you need it if you can create a new task with new data?

When you need to export your dataset, just do that. Export your project with all tasks.

@pieris98
Copy link

@nmanovic I won't answer for automatic or manual extension because I think each user has different ways to do it (automatic is just a script that performs manual steps using CVAT's APIs or otherwise).

For us the use case was that a project provides the labels (use project's labels when creating task), and each task represents a subset of the data. Let's say as an example that task1 contains images of animals indoors and task2 contains images of animals outdoors.
It's very useful in this case to only have 2 tasks that are extensible so that when new images are collected and added they are separated based on indoor/outdoor categories by default using the task separation.
This is useful because:

  1. When exporting the task annotations, they're all already split by task and they can be further split to train,val without having to manually select and calculate dataset balance (I know the total of images in task1 are 1000 are split 80%-20% and I have 800-200 images of indoors animals in my train-val split, so I can adjust my train-val split percentages in task2 folder to achieve the data proportions as I want).
  2. When annotating teams can be split to these tasks and their sub-jobs knowing how many images they've annotated in total for each category (no need to sum tasks).
  3. You have a clear separation of data divisions in each task (can be anything from indoor vs. outdoor data example to good vs. bad previous annotations, annotations to verify vs. new annotations etc.).
  4. When new images/data is collected and added to the dataset, there can be a separation by default in CVAT (per extensible task) so that the annotated dataset "evolves".

These are just a few of the reasons on top of my head. I also just think that it's natural since removing images from a task/job is already an available feature, so adding is just an expected functionality for the average user (hence the number of issues regarding this).

Thanks for your interest in this and hope we can develop this within the CVAT community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need info Need more information to investigate the issue
Projects
None yet
Development

No branches or pull requests

9 participants