The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
- Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
What? | File |
---|---|
code: main script | bin/detect_and_track_vehicles.py |
code: helper module | lib/helper_vehicle_detection.py |
code: tracking class | lib/detection.py |
code: tracking class | lib/vehicle.py |
code: tracking class | lib/position.py |
training data | etc/ml_train_img |
input test images | inp/img/test_images/* |
input project video | inp/vid/project.mp4 |
output test images | out/img/* |
output project video | out/vid/project_output.mp4 |
usage: detect_and_track_vehicles.py [-h] [--video PATH] [--startTime INT]
[--endTime INT] [--unroll] [--collect]
[--visLog INT] [--format STRING]
[--outDir PATH] [--mlDir PATH]
a tool for detecting lane lines in images and videos
optional arguments:
-h, --help show this help message and exit
--video PATH video from a front facing camera. to detect lane lines
--startTime INT when developing the image pipeline it can be helpful to
focus on the difficult parts of an video. Use this argument
to shift the entry point. Eg. --startTime=25 starts the
processing pipeline at the 25th second after video begin.
--endTime INT Use this argument to shift the exit point. Eg. --endTime=50
ends the processing pipeline at the 50th second of the
second minute after video begin.
--unroll Use this argument to unroll the resulting video in single
frames.
--collect Use this argument to collect false positives to improve
learning.
--visLog INT for debugging or documentation of the pipeline you can
output the image at a certain processing step 1=detections,
2=heatmap, 3=thresholded_heatmap 4=result
--format STRING to visualize several steps of the image pipeline and plot
them in one single image. use --format=collage4 for a
4-image-collage
--outDir PATH directory for output data. must not exist at call time.
default is --outDir=output_directory_<time>
--mlDir PATH directory for machine learning training images. directory
must contain 2 subdirectories "vehicles" and "non-
vehicles". default is --mlDir=etc/ml_train_img
example call for processing a video:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4
example call for processing only the part of a video between 38 and 45 seconds:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --startTime 38 --endTime 45
example call for processing a video. This outputs a video of a certain step of the detection pipeline:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --visLog 2
example call for processing a video. This outputs a video of 4 important steps of the image pipeline:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --format collage4
example call for processing a video. This outputs a video as a mp4 file and for each frame of the video an image:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --unroll
The detection of vehicles will be performed by a Support Vector Machine Classifier.
The high level code for the creation of such a classifier can be found in the 'createClassifier' function in lines 635 through 783 of file 'lib/helper_vehicle_detection.py'.
To get a classifier which can distinguish between vehicles and non-vehicles, I trained the classifier with approximatly 8500 images of each category. Every image is a 64 x 64 3-color image. The data can be found in the /etc/ml_train_img directory. The vehicle training data looks like this:
And the non-vehicle training data looks like this:
A useful HOG representation of an image should generalize well over a variety of colors and different views of similar shapes and stay distinct enough, to distinguish an object class from other classes.
To get from the RGB image to a HOG representation, I did this:
- Convert the image into a color space that I know produces good HOG-representations. I had good experiences with the color space 'LUV' and 'YCrCb'
- Extract one color channel
- use the function 'skimage.hog()' to convert this color channel into a hog image
- For the HOG calculation, I used 9 orientations, 8 pixels_per_cell, and 2 cells_per_block.
- For the project, I convert all 3 channels in their HOG-Representation and concatenate them to the comprehensive feature vector of the image
The code for the HOG Transformation can be found in function 'get_hog_features' in lines 611 through 637 of file lib/helper_vehicle_detection.py.
This is how the original image and the HOG representation of the Y-Channel (in YCrCb color space) looks like.
To improve the performance of the classifier, I decided to feed in more data and use 'Histogram of Color' as a source for my feature vector.
To get from the RGB image to the Histogram of Color, I did this:
- Convert the image into a color space that I know produces good distinguishable Histograms of Color. I don't have any experience with this, so I decided to simply use the same color space (YCrCb) as for the HOG extraction.
- Extract one color channel
- use the function np.histogram() to convert the color data into a histogram with 32 bins.
- For the project, I convert all 3 channels in their 'Histogram of Color'-Representation and concatenate them to the comprehensive feature vector of the image
The code for the Histogram of Color can be found in function 'color_hist' in lines 506 through 514 of file lib/helper_vehicle_detection.py.
To improve the performance of the classifier, I decided to feed in more data and use 'Spatial Binning' as a source for my feature vector. Spatial Binning is the histogram of pixel intensity of an image.
To get from the RGB image to the Spatial Binned Representation, I did this:
- Convert the image into a color space that I know produces good distinguishable Histograms of Color. I don't have any experience with this, so I decided to simply use the same color space (YCrCb) as for the HOG extraction.
- resize the image to 32 x 32 x 3
- flatten the color image with the function np.ravel()
- create a histogram of the flattened image and concatenate it to the comprehensive feature vector of the image
The code for the Spatial Binning can be found in function 'bin_spatial' in lines 498 through 502 of file lib/helper_vehicle_detection.py.
I split the data in 80% training- and 20% test-set. I scaled the features to zero mean and unit variance. I used the StandardScaler() from the sklearn module. For the feature generation, I tried several combinations of parameters. My best accuracy of 99.6% on the test set, I achieved with the following configuration:
ALL color_space: 'YCrCb'
SPATIAL-BIN spatial size: (32, 32)
COLOR-HIST amount of bins: 32
HOG amount orientations: 9
HOG pix_per_cell: 8
HOG cell_per_block: 2
HOG use of channels: 'ALL'
For later usage I saved the classifier in a pickle file.
To detect a vehicle, we only have to look at the parts of the image where a vehicle could occur. Vehicles that are farther away appear smaller in the image than vehicles that are near the camera. To address these circumstances, I did the following:
- Search for vehicles in image only in area 450 < y 650
- different search windows sizes (110 x 110), (90 x 90), (64 x 64) and (50 x 50)
- overlap of search windows of 75%
The code for determining of Sliding Windows can be found in function 'slide_windows' in lines 420 through 459 of file lib/helper_vehicle_detection.py.
This is how 64 x 64 search windows look like:
(left: 64 x 64 without overlap, right: 64 x 64 with 75% overlap)
The search windows for which the classifier detects a vehicle are called hot windows.
The code for determining of Hot Windows can be found in function 'search_window' in lines 463 through 495 of file lib/helper_vehicle_detection.py.
This is an image with several hot windows
The areas inside of hot windows are summed up to a heat map of possible vehicle positions. Not every hot window is a unique car - usually a vehicle is detected by a bunch of hot windows of different sizes. This leads to hot areas that are connected. To eliminate false positives an area needs at least 2 overlapping detections to be considered as a candidate for a vehicle position.
The code for generating the heat map can be found in function 'add_heat' in lines 245 through 253 of file lib/helper_vehicle_detection.py.
This is how a heat map looks like.
(left: hot windows, right: heat map)
Every isolated hot area is a possible vehicle position.
The code for generating the labels can be found in lines 175 through 185 of file lib/helper_vehicle_detection.py.
(left: heat map, right: thresholded areas (labels))
If 3 consecutive frames show a detection within a tolerance of 25 pixels from the anticipated position, then this detection is seen as a valid vehicle position. The anticipated position is calculated from the position of 3 consecutive frames.
The code for detecting and tracking vehicles in videos can be found in line 204 of file lib/helper_vehicle_detection.py and in the class files lib/detection, lib/vehicle and lib/position.
(left: thresholded areas (labels), right: resulting vehicle detections)
Each detected vehicle position is reviewed in every frame. This is similar to the first detection of a vehicle - a position has to be confirmed within a radius of 25 pixels. If a tracked vehicle cannot be confirmed on 2 consecutive frames, the vehicle is removed and considered of having left the image.
The code for detecting and tracking vehicles in videos can be found in line 204 of file lib/helper_vehicle_detection.py and in the class files lib/detection, lib/vehicle and lib/position.
You find the result of project video here out/vid/project_output.mp4
You find the result of project video with pipeline visualization here out/vid/project_collage4_output.mp4
In my early attempts, there were a lot of false positives on the road surface near the second bridge.
I solved the problem by taking additional training samples from exact the areas that produced the lot of hot windows. This eliminated almost all false positives.
First I had 2 hot zones on the white car, that have been recognized as 2 vehicles.
I solved the problem by adding 50 x 50 search windows and by increasing the overlap to 75% in all search window sizes.
It happened that the detection of the white car got lost.
I increased the position tolerance of the frame-to-frame confirmation radius from 15 to 25 pixels and allowed to fail the confirmation in one frame without dropping the vehicle.
- A lot of existing training examples are taken from the project video, so there is a strong bias towards this video. The classifier would most likely fail or at least perform significantly worse on other videos in other settings.
-
More training data would improve robustness
- with different cars
- different daytime and luminosity
- different colors
- different streets
- different vegetation
- urban / rural settings
-
I think a well trained CNN would generalize better than a linear SVM.