Now computers become our partners. They help us to solve routine tasks, fix mistakes, find information, etc. It is a natural idea to use their compute power to annotate datasets. There are multiple DL models for classification, object detection, semantic segmentation which can do data annotation for us. And it is relatively simple to integrate your own ML/DL solution into CVAT.
But the world is not perfect and we don't have a silver bullet which can solve all our problems. Usually available DL models are trained on public datasets which cannot cover all specific cases. Very often you want to detect objects which cannot be recognized by these models. Our annotation requirements can be so strict that automatically annotated objects cannot be accepted as is and it is easy to annotate them from scratch. You always need to keep in mind all these mentioned limitations. Even if you have a DL solution which can perfectly annotate 50% of your data, it means that manual work will be reduced only twice in the best case.
When we know that DL models can help us to annotate data faster, the next
question is how to use them? In CVAT all such DL models are implemented
as serverless functions for nuclio serverless platform.
And there are multiple implemented functions which can be
found in serverless directory such as Mask RCNN
,
Faster RCNN
, SiamMask
, Inside Outside Guidance
, Deep Extreme Cut
, etc.
Follow the installation guide to build and deploy
these serverless functions. See the user guide to
understand how to use these functions in UI to automatically annotate data.
What is a serverless function and why is it used for automatic annotation in CVAT? Let's assume that you have a DL model and want to use it for AI assisted annotation. The naive approach is to implement a python script which uses the DL model to prepare a file with annotations in a public format like MS COCO or Pascal VOC. After that you can upload the annotation file into CVAT. It works but it is not user-friendly. How to force CVAT to run the script for you?
You can pack the script with your DL model into a container which provides standard interface to interact with it. One way to do that is to use function as a service approach. Your script becomes a function inside cloud infrastructure which can be called over HTTP. The nuclio serverless platform helps us to implement and manage such functions.
CVAT supports nuclio out of the box if it is built properly. See the installation guide for instructions. Thus if you deploy a serverless function, CVAT server can see it and call with appropriate arguments. Of course there are some tricks how to create serverless functions for CVAT and we will discuss them in next sections of the tutorial.
Let's see on some examples how to use DL models for different annotation tasks.
In the tutorial it is assumed that you already have the cloned
CVAT GitHub repo.
To build CVAT with serverless support you need to include corresponding
docker-compose files. In our case it is docker-comopse.serverless.yml
.
It has necessary instructions how to build and deploy nuclio platform
as a docker container and enable corresponding support in CVAT.
docker-compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml up -d --build
docker-compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml ps
Name Command State Ports
-------------------------------------------------------------------------------------------------------------
cvat /usr/bin/supervisord Up 8080/tcp
cvat_db docker-entrypoint.sh postgres Up 5432/tcp
cvat_proxy /docker-entrypoint.sh /bin ... Up 0.0.0.0:8080->80/tcp,:::8080->80/tcp
cvat_redis docker-entrypoint.sh redis ... Up 6379/tcp
cvat_ui /docker-entrypoint.sh ngin ... Up 80/tcp
nuclio /docker-entrypoint.sh sh - ... Up (healthy) 80/tcp, 0.0.0.0:8070->8070/tcp,:::8070->8070/tcp
To deploy builtin serverless functions you need to install nuclio command
line tool (aka nuctl
) for your operating system. Again it is assumed that
you followed the installation guide and nuctl
is already installed on your system. Run the following command to check that
it works. In the beginning you should not have any deployed serverless
functions.
nuctl get functions
No functions found
Let's look at specific use cases which can help you to annotate data for different tasks.
In this use case a user needs to annotate all individual objects on a video as tracks. Basically for every object we need to know its location on every frame.
First step is to deploy SiamMask. The deployment process
can depend on your operating system. On Linux you can use serverless/deploy_cpu.sh
auxiliary script but in the tutorial we are using nuctl
directly.
nuctl create project cvat
nuctl deploy --project-name cvat --path "./serverless/pytorch/foolwood/siammask/nuclio" --platform local
21.05.07 13:00:22.233 nuctl (I) Deploying function {"name": ""}
21.05.07 13:00:22.233 nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.05.07 13:00:22.652 nuctl (I) Cleaning up before deployment {"functionName": "pth-foolwood-siammask"}
21.05.07 13:00:22.705 nuctl (I) Staging files and preparing base images
21.05.07 13:00:22.706 nuctl (I) Building processor image {"imageName": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:22.706 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.05.07 13:00:26.351 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.05.07 13:00:29.819 nuctl.platform (I) Building docker image {"image": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:30.103 nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.foolwood.siammask:latest", "registry": ""}
21.05.07 13:00:30.103 nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:30.104 nuctl (I) Build complete {"result": {"Image":"cvat/pth.foolwood.siammask:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-foolwood-siammask","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","name":"SiamMask","spec":"","type":"tracker"}},"spec":{"description":"Fast Online Object Tracking and Segmentation","handler":"main:handler","runtime":"python:3.6","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/SiamMask:/opt/nuclio/SiamMask/experiments/siammask_sharp"}],"resources":{},"image":"cvat/pth.foolwood.siammask:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"build":{"image":"cvat/pth.foolwood.siammask","baseImage":"continuumio/miniconda3","directives":{"preCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"conda create -y -n siammask python=3.6"},{"kind":"SHELL","value":"[\"conda\", \"run\", \"-n\", \"siammask\", \"/bin/bash\", \"-c\"]"},{"kind":"RUN","value":"git clone https://github.com/foolwood/SiamMask.git"},{"kind":"RUN","value":"pip install -r SiamMask/requirements.txt jsonpickle"},{"kind":"RUN","value":"conda install -y gcc_linux-64"},{"kind":"RUN","value":"cd SiamMask \u0026\u0026 bash make.sh \u0026\u0026 cd -"},{"kind":"RUN","value":"wget -P SiamMask/experiments/siammask_sharp http://www.robots.ox.ac.uk/~qwang/SiamMask_DAVIS.pth"},{"kind":"ENTRYPOINT","value":"[\"conda\", \"run\", \"-n\", \"siammask\"]"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.05.07 13:00:31.387 nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.05.07 13:00:32.796 nuctl (I) Function deploy complete {"functionName": "pth-foolwood-siammask", "httpPort": 49155}
nuctl get functions
NAMESPACE | NAME | PROJECT | STATE | NODE PORT | REPLICAS
nuclio | pth-foolwood-siammask | cvat | ready | 49155 | 1/1
Let's see how it works in UI. First of all go to http://localhost:8080/models and check that you can see SiamMask in the list. If you cannot by a reason it means that there are some problems. Go to one of our public channels and ask for help.
After that go to http://localhost:8080/tasks/create and create an annotation
task with the video file. You can choose any task name, any
labels, and even another video file if you like. In this case Remote sources
option was used to specify the video file. Press submit button at the end to
finish the process.
Open the task and use AI tools to start tracking an object. Draw a bounding box around the object and it will be tracked during a couple of frames forward.
Finally you will get bounding boxes for 10 frames by default.
-
Architecture of a serverless functions in CVAT (function.yaml, entry point)
-
Choose a dataset which you want to annotate
-
Choose a custom model which you want to integrate - DONE
-
Prepare necessary files (function.yaml, main.py) - DONE
-
Automatically annotate data using the prepared model
-
Compare results with the ground truth using Datumaro
-
Conclusion
- Optimize using GPU
- Testing
- Logging (docker)
- Debugging
- Troubleshooting
In my case I will choose a popular AI library with a lot of models inside. In your case it can be your own model. If it is based on detectron2 it will be easy to integrate. Just follow the tutorial.
Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.
Clone the repository somewhere. I assume that all other experiments will be
run from the cloned detectron2
directory.
git clone https://github.com/facebookresearch/detectron2
cd detectron2
Let's run a detection model locally. First of all need to install requirements for the library.
In my case I have Ubuntu 20.04 with python 3.8.5. I installed PyTorch 1.8.1 for Linux with pip, python, and CPU inside a virtual environment. Follow opencv-python installation guide to get the library for demo and visualization.
python3 -m venv .detectron2
. .detectron2/bin/activate
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install opencv-python
Install the detectron2 library from your local clone.
python -m pip install -e detectron2 .
After the library from Facebook AI Research is installed, we can run a couple of experiments. See the official tutorial for more examples. I decided to experiment with RetinaNet. First step is to download model weights.
curl -O https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl
To run experiments let's download an image with cats from wikipedia.
curl -O https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/1920px-Cat_poster_1.jpg
Finally let's run the DL model inference on CPU. If all is fine, you will see a window with cats and bounding boxes around them with scores.
python demo/demo.py --config-file configs/COCO-Detection/retinanet_R_101_FPN_3x.yaml \
--input 1920px-Cat_poster_1.jpg --opts MODEL.WEIGHTS model_final_971ab9.pkl MODEL.DEVICE cpu
Next step is to minimize demo/demo.py
script and keep code which is necessary to load,
run, and interpret output of the model only. Let's hard code parameters and remove
argparse. Keep only code which is responsible for working with an image. There is
no common advice how to minimize some code.
Finally you should get something like the code below which has fixed config, read a predefined image, initialize predictor, and run inference. As the final step it prints all detected bounding boxes with scores and labels.
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.engine.defaults import DefaultPredictor
from detectron2.data.datasets.builtin_meta import COCO_CATEGORIES
CONFIG_FILE = "configs/COCO-Detection/retinanet_R_101_FPN_3x.yaml"
CONFIG_OPTS = ["MODEL.WEIGHTS", "model_final_971ab9.pkl", "MODEL.DEVICE", "cpu"]
CONFIDENCE_THRESHOLD = 0.5
def setup_cfg():
cfg = get_cfg()
cfg.merge_from_file(CONFIG_FILE)
cfg.merge_from_list(CONFIG_OPTS)
cfg.MODEL.RETINANET.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = CONFIDENCE_THRESHOLD
cfg.freeze()
return cfg
if __name__ == "__main__":
cfg = setup_cfg()
input = "1920px-Cat_poster_1.jpg"
img = read_image(input, format="BGR")
predictor = DefaultPredictor(cfg)
predictions = predictor(img)
instances = predictions['instances']
pred_boxes = instances.pred_boxes
scores = instances.scores
pred_classes = instances.pred_classes
for box, score, label in zip(pred_boxes, scores, pred_classes):
label = COCO_CATEGORIES[int(label)]["name"]
print(box.tolist(), float(score), label)
When we know how to run the DL model locally, we can prepare a serverless function which can be used by CVAT to annotate data. Let's see how function.yaml will look like...
Let's look at faster_rcnn_inception_v2_coco serverless
function configuration as an example and try adapting it to our case.
First of all let's invent an unique name for the new function:
pth.facebookresearch.detectron2.retinanet_r101
. Section annotations
describes our function for CVAT serverless subsystem:
annotations.name
is a display nameannotations.type
is a type of the serverless function. It can have several different values. Basically it affects input and output of the function. In our case it hasdetector
type and it means that the integrated DL model can generate shapes with labels for an image.annotations.framework
is used for information only and can have arbitrary value. Usually it has values like OpenVINO, PyTorch, TensorFlow, etc.annotations.spec
describes the list of labels which the model supports. In the case the DL model was trained on MS COCO dataset and the list of labels correspond to the dataset.spec.description
is used to provide basic information for the model.
All other parameters are described in nuclio documentation.
spec.handler
is the entry point to your function.spec.runtime
is the name of the language runtime.spec.eventTimeout
is the global event timeout
Next step is to describe how to build our serverless function:
spec.build.image
is the name of your docker imagespec.build.baseImage
is the name of a base container image from which to build the functionspec.build.directives
are commands to build your docker image
In our case we start from Ubuntu 20.04 base image, install curl
to download
weights for our model, git
to clone detectron2 project from GitHub, and
python
together with pip
. Repeat installation steps which we used to setup
the DL model locally with minor modifications.
For Nuclio platform we have to specify a couple of more parameters:
spec.triggers.myHttpTrigger
describes HTTP trigger to handle incoming HTTP requests.spec.platform
describes some important parameters to run your functions likerestartPolicy
andmountMode
. Read nuclio documentation for more details.
metadata:
name: pth.facebookresearch.detectron2.retinanet_r101
namespace: cvat
annotations:
name: RetinaNet R101
type: detector
framework: pytorch
spec: |
[
{ "id": 1, "name": "person" },
{ "id": 2, "name": "bicycle" },
...
{ "id":89, "name": "hair_drier" },
{ "id":90, "name": "toothbrush" }
]
spec:
description: RetinaNet R101 from Detectron2
runtime: 'python:3.8'
handler: main:handler
eventTimeout: 30s
build:
image: cvat/pth.facebookresearch.detectron2.retinanet_r101
baseImage: ubuntu:20.04
directives:
preCopy:
- kind: ENV
value: DEBIAN_FRONTEND=noninteractive
- kind: RUN
value: apt-get update && apt-get -y install curl git python3 python3-pip
- kind: WORKDIR
value: /opt/nuclio
- kind: RUN
value: pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
- kind: RUN
value: git clone https://github.com/facebookresearch/detectron2
- kind: RUN
value: pip3 install -e detectron2
- kind: RUN
value: curl -O https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl
- kind: RUN
value: ln -s /usr/bin/pip3 /usr/local/bin/pip
triggers:
myHttpTrigger:
maxWorkers: 2
kind: 'http'
workerAvailabilityTimeoutMilliseconds: 10000
attributes:
maxRequestBodySize: 33554432 # 32MB
platform:
attributes:
restartPolicy:
name: always
maximumRetryCount: 3
mountMode: volume
Full code can be found here: detectron2/retinanet/nuclio/function.yaml
Next step is to adapt our source code which we implemented to run the DL model
locally to requirements of nuclio platform. First step is to load the model
into memory using init_context(context)
function. Read more about the function
in Best Practices and Common Pitfalls.
After that we need to accept incoming HTTP requests, run inference,
reply with detection results. For the process our entry point is resposible
which we specified in our function specification handler(context, event)
.
Again in accordance to function specification the entry point should be
located inside main.py
.
def init_context(context):
cfg = get_cfg()
cfg.merge_from_file(CONFIG_FILE)
cfg.merge_from_list(CONFIG_OPTS)
cfg.MODEL.RETINANET.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = CONFIDENCE_THRESHOLD
cfg.freeze()
predictor = DefaultPredictor(cfg)
setattr(context.user_data, 'model_handler', predictor)
functionconfig = yaml.safe_load(open("/opt/nuclio/function.yaml"))
labels_spec = functionconfig['metadata']['annotations']['spec']
labels = {item['id']: item['name'] for item in json.loads(labels_spec)}
setattr(context.user_data, "labels", labels)
def handler(context, event):
data = event.body
buf = io.BytesIO(base64.b64decode(data["image"].encode('utf-8')))
threshold = float(data.get("threshold", 0.5))
image = convert_PIL_to_numpy(Image.open(buf), format="BGR")
predictions = context.user_data.model_handler(image)
instances = predictions['instances']
pred_boxes = instances.pred_boxes
scores = instances.scores
pred_classes = instances.pred_classes
results = []
for box, score, label in zip(pred_boxes, scores, pred_classes):
label = COCO_CATEGORIES[int(label)]["name"]
if score >= threshold:
results.append({
"confidence": str(float(score)),
"label": label,
"points": box.tolist(),
"type": "rectangle",
})
return context.Response(body=json.dumps(results), headers={},
content_type='application/json', status_code=200)
Full code can be found here: detectron2/retinanet/nuclio/main.py
To use the new serverless function you have to deploy it using nuctl
command.
The actual deployment process is described in
automatic annotation guide