Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serverless result formats #6332

Closed
patrickwasp opened this issue Jun 16, 2023 · 2 comments
Closed

serverless result formats #6332

patrickwasp opened this issue Jun 16, 2023 · 2 comments
Labels
question Further information is requested

Comments

@patrickwasp
Copy link

patrickwasp commented Jun 16, 2023

where would I find information about the format serverless functions should return for automatic annotation? What "types" are available, and what are the formats CVAT expects for each of them?

Here are what I found by looking at the examples in the serverless folder, I'm not sure if my interpretation is right:

instance segmentation mask_rcnn

"confidence": a number between 0 and 1,
"label": the string representation of the class name,
"points": a list of points representing a single polygon (x1, y1, x2, y2, x3, y3, ..., xn, yn), 
"mask": a list of 0 and 1 representing a binary mask cropped around the object, with the last four elements representing the top left and bottom right coordinates of the object's bounding box, (x_top_left, y_top_left, x_bottom_right, y_bottom_right)
"type": "mask",
  • how would we represent an object with multiple shapes, for example when there is an occlusion in the middle of it? Can "points" be a two-dimensional list?
  • do we need points and mask data for type "mask"?

object detection detectron2 retinanet

"confidence": a number between 0 and 1,
"label": the string representation of the class name,
"points": a list of 4 points representing the top left and bottom right coordinates of the object's bounding box, (x_top_left, y_top_left, x_bottom_right, y_bottom_right)
"type": "rectangle",

image embeddings sam

"blob": image embeddings stored as a base64 string

where the embeddings are of shape 1xCxHxW, where C is the embedding dimension and (H,W) are the embedding spatial dimension of SAM (typically C=256, H=W=64).

@bsekachev
Copy link
Member

What "types" are available, and what are the formats CVAT expects for each of them?

CVAT types: rectangle, polygon, points, polyline, ellipse, mask, tag and cuboid (the latest two, need to re-check).
rectangle: [xtl, ytl, xbr, ybr]
polygon, points, polyline: [x1, y1, x2, y2, x3, y3, ... ]
ellipse: probably [cx, cy, right x, top y]
mask: [RLE-encoded ROI, xtl, ytl, xbr, ybr] where the latest 4 are ROI coordinates

how would we represent an object with multiple shapes, for example when there is an occlusion in the middle of it? Can "points" be a two-dimensional list?

Currently only with masks. Multi-dimensional list is not supported. It could be enhancement. See #3676

do we need points and mask data for type "mask"?

As far as I remember for type "mask" mask is only obligatory. Client will convert it to polygon using OpenCV if necessary

SAM output additionally handled by sam plugin on client side (cvat-ui/plugins/sam).

@bsekachev bsekachev added the question Further information is requested label Jul 7, 2023
@bsekachev
Copy link
Member

For mask I was wrong. This is not RLE-encoded. Option you suggested is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants