Skip to content

Commit

Permalink
added in semantic segmentation instructions to README (#804)
Browse files Browse the repository at this point in the history
  • Loading branch information
benhoff authored and nmanovic committed Oct 31, 2019
1 parent 05c5230 commit dfc33b5
Showing 1 changed file with 189 additions and 1 deletion.
190 changes: 189 additions & 1 deletion cvat/apps/auto_annotation/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
## Auto annotation

- [Description](#description)
- [Installation](#installation)
- [Usage](#usage)
- [Testing script](#testing)
- [Examples](#examples)
- [Person-vehicle-bike-detection-crossroad-0078](#person-vehicle-bike-detection-crossroad-0078-openvino-toolkit)
- [Landmarks-regression-retail-0009](#landmarks-regression-retail-0009-openvino-toolkit)
- [Semantic Segmentation](#semantic-segmentation)
- [Available interpretation scripts](#available-interpretation-scripts)

### Description

The application will be enabled automatically if
Expand Down Expand Up @@ -87,7 +97,43 @@ It includes a small user interface which allows users to feed in images and see
the user interfaces provided by OpenCV.

See the script and the documentation in the
[auto_annotation directory](https://github.com/opencv/cvat/tree/develop/utils/auto_annotation)
[auto_annotation directory](https://github.com/opencv/cvat/tree/develop/utils/auto_annotation).

When using the Auto Annotation runner, it is often helpful to drop into a REPL prompt to interact with the variables
directly. You can do this using the `interact` method from the `code` module.

```python
# Import the interact method from the `code` module
from code import interact


for frame_results in detections:
frame_height = frame_results["frame_height"]
frame_width = frame_results["frame_width"]
frame_number = frame_results["frame_id"]
# Unsure what other data members are in the `frame_results`? Use the `interact method!
interact(local=locals())
```

```bash
$ python cvat/utils/auto_annotation/run_models.py --py /path/to/myfile.py --json /path/to/mapping.json --xml /path/to/inference.xml --bin /path/to/inference.bin
Python 3.6.6 (default, Sep 26 2018, 15:10:10)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__builtins__', 'frame_results', 'detections', 'frame_number', 'frame_height', 'interact', 'results', 'frame_width']
>>> type(frame_results)
<class 'dict'>
>>> frame_results.keys()
dict_keys(['frame_id', 'frame_height', 'frame_width', 'detections'])
```

When using the `interact` method, make sure you are running using the _testing script_, and ensure that you _remove it_
before submitting to the server! If you don't remove it from the server, the code runners will hang during execution,
and you'll have to restart the server to fix them.

Another useful development method is visualizing the results using OpenCV. This will be discussed more in the
[Semantic Segmentation](#segmentation) section.

### Examples

Expand Down Expand Up @@ -178,7 +224,149 @@ for frame_results in detections:
)
```

#### Semantic Segmentation

__Links__
- [masck_rcnn_resnet50_atrous_coco][1] (OpenvVINO toolkit)
- [CVAT Implemenation][2]

__label_map.json__:
```json
{
"label_map": {
"1": "person",
"2": "bicycle",
"3": "car",
}
}
```

Note that the above labels are not all the labels in the model! See [here](https://github.com/opencv/cvat/blob/develop/utils/open_model_zoo/mask_rcnn_inception_resnet_v2_atrous_coco/mapping.json).

**Interpretation script for a semantic segmentation network**:
```python
import numpy as np
import cv2
from skimage.measure import approximate_polygon, find_contours


for frame_results in detections:
frame_height = frame_results['frame_height']
frame_width = frame_results['frame_width']
frame_number = frame_results['frame_id']
detection = frame_results['detections']

# The keys for the below two members will vary based on the model
masks = frame_results['masks']
boxes = frame_results['reshape_do_2d']

for box_index, box in enumerate(boxes):
# Again, these indexes specific to this model
class_label = int(box[1])
box_class_probability = box[2]

if box_class_probability > 0.2:
xmin = box[3] * frame_width
ymin = box[4] * frame_height
xmax = box[5] * frame_width
ymax = box[6] * frame_width

box_width = int(xmax - xmin)
box_height = int(ymin - ymax)

# use the box index and class label index to find the appropriate mask
# note that we need to convert the class label to a zero indexed array by subtracting `1`
class_mask = masks[box_index][class_label - 1]

# Class mask is a 33 x 33 matrix
# resize it to the bounding box
resized_mask = cv2.resize(class_mask, dsize(box_height, box_width), interpolation=cv2.INTER_CUBIC)

# Each pixel is a probability, select every pixel above the probability threshold, 0.5
# Do this using the boolean `>` method
boolean_mask = (resized_mask > 0.5)

# Convert the boolean values to uint8
uint8_mask = boolean_mask.astype(np.uint8) * 255

# Change the x and y coordinates into integers
xmin = int(round(xmin))
ymin = int(round(ymin))
xmax = xmin + box_width
ymax = ymin + box_height

# Create an empty blank frame, so that we can get the mask polygon in frame coordinates
mask_frame = np.zeros((frame_height, frame_width), dtype=np.uint8)

# Put the uint8_mask on the mask frame using the integer coordinates
mask_frame[xmin:xmax, ymin:ymax] = uint8_mask

mask_probability_threshold = 0.5
# find the contours
contours = find_contours(mask_frame, mask_probability_threshold)
# every bounding box should only have a single contour
contour = contours[0]
contour = np.flip(contour, axis=1)

# reduce the precision on the polygon
polygon_mask = approximate_polygon(contour, tolerance=2.5)
polygon_mask = polygon_mask.tolist()

results.add_polygon(polygon_mask, class_label, frame_number)
```

Note that it is sometimes hard to see or understand what is happening in a script.
Use of the computer vision module can help you visualize what is happening.

```python
import cv2


for frame_results in detections:
frame_height = frame_results['frame_height']
frame_width = frame_results['frame_width']
detection = frame_results['detections']

masks = frame_results['masks']
boxes = frame_results['reshape_do_2d']

for box_index, box in enumerate(boxes):
class_label = int(box[1])
box_class_probability = box[2]

if box_class_probability > 0.2:
xmin = box[3] * frame_width
ymin = box[4] * frame_height
xmax = box[5] * frame_width
ymax = box[6] * frame_width

box_width = int(xmax - xmin)
box_height = int(ymin - ymax)

class_mask = masks[box_index][class_label - 1]
# Visualize the class mask!
cv2.imshow('class mask', class_mask)
# wait until user presses keys
cv2.waitKeys()

boolean_mask = (resized_mask > 0.5)
uint8_mask = boolean_mask.astype(np.uint8) * 255

# Visualize the class mask after it's been resized!
cv2.imshow('class mask', uint8_mask)
cv2.waitKeys()
```

Note that you should _only_ use the above commands while running the [Auto Annotation Model Runner][3].
Running on the server will likely require a server restart to fix.
The method `cv2.destroyAllWindows()` or `cv2.destroyWindow('your-name-here')` might be required depending on your
implementation.

### Available interpretation scripts

CVAT comes prepackaged with several out of the box interpretation scripts.
See them in the [open model zoo directory](https://github.com/opencv/cvat/tree/develop/utils/open_model_zoo)

[1]: https://github.com/opencv/open_model_zoo/blob/master/models/public/mask_rcnn_resnet50_atrous_coco/model.yml
[2]: https://github.com/opencv/cvat/tree/develop/utils/open_model_zoo/mask_rcnn_inception_resnet_v2_atrous_coco
[3]: https://github.com/opencv/cvat/tree/develop/utils/auto_annotation

0 comments on commit dfc33b5

Please sign in to comment.