added in semantic segmentation instructions to README

cvat-ai · Oct 29, 2019 · 39de745 · 39de745
1 parent a0f083d
commit 39de745
Showing 1 changed file with 189 additions and 1 deletion.
diff --git a/cvat/apps/auto_annotation/README.md b/cvat/apps/auto_annotation/README.md
@@ -1,5 +1,15 @@
 ## Auto annotation
 
+- [Description](#description)
+- [Installation](#installation)
+- [Usage](#usage)
+- [Testing script](#testing)
+- [Examples](#examples)
+  - [Person-vehicle-bike-detection-crossroad-0078](#person-vehicle-bike-detection-crossroad-0078-openvino-toolkit)
+  - [Landmarks-regression-retail-0009](#landmarks-regression-retail-0009-openvino-toolkit)
+  - [Semantic Segmentation](#semantic-segmentation)
+- [Available interpretation scripts](#available-interpretation-scripts)
+
 ### Description
 
 The application will be enabled automatically if
@@ -87,7 +97,43 @@ It includes a small user interface which allows users to feed in images and see
 the user interfaces provided by OpenCV.
 
 See the script and the documentation in the
-[auto_annotation directory](https://github.com/opencv/cvat/tree/develop/utils/auto_annotation)
+[auto_annotation directory](https://github.com/opencv/cvat/tree/develop/utils/auto_annotation).
+
+When using the Auto Annotation runner, it is often helpful to drop into a REPL prompt to interact with the variables 
+directly. You can do this using the `interact` method from the `code` module.
+
+```python
+# Import the interact method from the `code` module
+from code import interact
+
+
+for frame_results in detections:
+  frame_height = frame_results["frame_height"]
+  frame_width = frame_results["frame_width"]
+  frame_number = frame_results["frame_id"]
+  # Unsure what other data members are in the `frame_results`? Use the `interact method!
+  interact(local=locals())
+```
+
+```bash
+$ python cvat/utils/auto_annotation/run_models.py --py /path/to/myfile.py --json /path/to/mapping.json --xml /path/to/inference.xml --bin /path/to/inference.bin
+Python 3.6.6 (default, Sep 26 2018, 15:10:10)
+[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.10.44.2)] on darwin
+Type "help", "copyright", "credits" or "license" for more information.
+>>> dir()
+['__builtins__', 'frame_results', 'detections', 'frame_number', 'frame_height', 'interact', 'results', 'frame_width']
+>>> type(frame_results)
+<class 'dict'>
+>>> frame_results.keys()
+dict_keys(['frame_id', 'frame_height', 'frame_width', 'detections'])
+```
+
+When using the `interact` method, make sure you are running using the _testing script_, and ensure that you _remove it_
+ before submitting to the server! If you don't remove it from the server, the code runners will hang during execution,
+ and you'll have to restart the server to fix them.
+
+Another useful development method is visualizing the results using OpenCV. This will be discussed more in the 
+[Semantic Segmentation](#segmentation) section.
 
 ### Examples
 
@@ -178,7 +224,149 @@ for frame_results in detections:
       )
 ```
 
+#### Semantic Segmentation
+
+__Links__
+- [masck_rcnn_resnet50_atrous_coco][1]  (OpenvVINO toolkit)
+- [CVAT Implemenation][2]
+
+__label_map.json__:
+```json
+{
+"label_map": {
+        "1": "person",
+        "2": "bicycle",
+        "3": "car",
+    }
+}
+```
+
+Note that the above labels are not all the labels in the model! See [here](https://github.com/opencv/cvat/blob/develop/utils/open_model_zoo/mask_rcnn_inception_resnet_v2_atrous_coco/mapping.json).
+
+**Interpretation script for a semantic segmentation network**:
+```python
+import numpy as np
+import cv2
+from skimage.measure import approximate_polygon, find_contours
+
+
+for frame_results in detections:
+    frame_height = frame_results['frame_height']
+    frame_width = frame_results['frame_width']
+    frame_number = frame_results['frame_id']
+    detection = frame_results['detections']
+
+	# The keys for the below two members will vary based on the model
+    masks = frame_results['masks']
+    boxes = frame_results['reshape_do_2d']
+
+	for box_index, box in enumerate(boxes):
+		# Again, these indexes specific to this model
+		class_label = int(box[1])
+		box_class_probability = box[2]
+
+		if box_class_probability > 0.2:
+			xmin = box[3] * frame_width
+			ymin = box[4] * frame_height
+			xmax = box[5] * frame_width
+			ymax = box[6] * frame_width
+
+			box_width = int(xmax - xmin)
+			box_height =  int(ymin - ymax)
+
+			# use the box index and class label index to find the appropriate mask
+			# note that we need to convert the class label to a zero indexed array by subtracting `1`
+			class_mask = masks[box_index][class_label - 1]
+
+			# Class mask is a 33 x 33 matrix
+			# resize it to the bounding box
+			resized_mask = cv2.resize(class_mask, dsize(box_height, box_width), interpolation=cv2.INTER_CUBIC)
+
+			# Each pixel is a probability, select every pixel above the probability threshold, 0.5
+			# Do this using the boolean `>` method
+			boolean_mask = (resized_mask > 0.5)
+
+			# Convert the boolean values to uint8 
+			uint8_mask = boolean_mask.astype(np.uint8) * 255
+
+			# Change the x and y coordinates into integers
+			xmin = int(round(xmin))
+			ymin = int(round(ymin))
+			xmax = xmin + box_width
+			ymax = ymin + box_height
+
+			# Create an empty blank frame, so that we can get the mask polygon in frame coordinates
+			mask_frame = np.zeros((frame_height, frame_width), dtype=np.uint8)
+
+			# Put the uint8_mask on the mask frame using the integer coordinates
+			mask_frame[xmin:xmax, ymin:ymax] = uint8_mask
+
+			mask_probability_threshold = 0.5
+			# find the contours
+			contours = find_contours(mask_frame, mask_probability_threshold)
+			# every bounding box should only have a single contour
+			contour = contours[0]
+			contour = np.flip(contour, axis=1)
+
+			# reduce the precision on the polygon
+			polygon_mask = approximate_polygon(contour, tolerance=2.5)
+			polygon_mask = polygon_mask.tolist()
+
+			results.add_polygon(polygon_mask, class_label, frame_number)
+```
+
+Note that it is sometimes hard to see or understand what is happening in a script. 
+Use of the computer vision module can help you visualize what is happening.
+
+```python
+import cv2
+
+
+for frame_results in detections:
+    frame_height = frame_results['frame_height']
+    frame_width = frame_results['frame_width']
+    detection = frame_results['detections']
+
+    masks = frame_results['masks']
+    boxes = frame_results['reshape_do_2d']
+
+	for box_index, box in enumerate(boxes):
+		class_label = int(box[1])
+		box_class_probability = box[2]
+
+		if box_class_probability > 0.2:
+			xmin = box[3] * frame_width
+			ymin = box[4] * frame_height
+			xmax = box[5] * frame_width
+			ymax = box[6] * frame_width
+
+			box_width = int(xmax - xmin)
+			box_height =  int(ymin - ymax)
+
+			class_mask = masks[box_index][class_label - 1]
+			# Visualize the class mask!
+			cv2.imshow('class mask', class_mask)
+			# wait until user presses keys
+			cv2.waitKeys()
+
+			boolean_mask = (resized_mask > 0.5)
+			uint8_mask = boolean_mask.astype(np.uint8) * 255
+
+			# Visualize the class mask after it's been resized!
+			cv2.imshow('class mask', uint8_mask)
+			cv2.waitKeys()
+```
+
+Note that you should _only_ use the above commands while running the [Auto Annotation Model Runner][3].
+Running on the server will likely require a server restart to fix. 
+The method `cv2.destroyAllWindows()` or `cv2.destroyWindow('your-name-here')` might be required depending on your
+ implementation.
+
 ### Available interpretation scripts
 
 CVAT comes prepackaged with several out of the box interpretation scripts.
 See them in the [open model zoo directory](https://github.com/opencv/cvat/tree/develop/utils/open_model_zoo)
+
+[1]: https://github.com/opencv/open_model_zoo/blob/master/models/public/mask_rcnn_resnet50_atrous_coco/model.yml
+[2]: https://github.com/opencv/cvat/tree/develop/utils/open_model_zoo/mask_rcnn_inception_resnet_v2_atrous_coco
+[3]: https://github.com/opencv/cvat/tree/develop/utils/auto_annotation