This is an implementation of the fish detection algorithm described by Salman, et al. (2019) [1]. The paper's reference implementation is available here.
This dataset is comprised of 17 videos from Kavasidis, et al. (2012) [2] and Kavasidis, et al. (2013) [3].
Available from the PeRCeiVe Lab. Use the "GT - KEY FRAMES" download link.
The videos are provided in the Flash Video (.flv
) format, which is not widely supported. Use FFmpeg to convert files to AVI:
for x in *.flv; do \
ffmpeg -i "$x" -c:v mjpeg "$(echo "$x" | sed 's/flv/avi/')"; \
done
This dataset was not used in the paper.
Available from NOAA.
The process_video.py
script processes each frame in the input video. A composite image is generated containing the following channels:
- Red - original frame (in grayscale))
- Green - extracted foreground
- Blue - optical flow (mixture of magnitude and angle)
The training process expects a abc.txt
file alongside each input image abc.jpg
containing a listing of bounding boxes of all objects. For instance,
0 0.5 0.5 0.10 0.25
represents an object of class 0, centered in the middle of the image, whose width is 10% of the image, and whose height is 25% of the image.
The DrawBox tool can be used to help label input images. The output may need to be converted to the above format.
An alternative is Yolo_mark.
Amazon Mechanical Turk can be used to crowdsource (for a small per-image fee) the task of labeling many frames. There is a template for bounding box labeling tasks.
Labels are returned as straightforward JSON objects, and must be converted to the above format. The utils/convert_mturk.py
tool provides this functionality.
The reference implementation diverges from the paper by using the YOLOv3 object detection algorithm, rather than an R-CNN. We will use YOLOv4, training our model using Alexey Bochkovskiy's fork of Darknet.
This guide explains the distinction between Darknet and YOLO, and these instructions explain in more detail the training process.
-
Prepare the
.txt
files alongside the input files, which we will assume are.jpg
files stored indata/
. -
Clone the Darknet repository and build it with OpenCV support. From here, we will assume that the directory
darknet/
contains the Darknet code, and thedarknet
executable is in the search path. -
Create the
yolo-obj.cfg
file per the Darknet instructions. A tool is provided in this repository to help:$ python configtool.py \ --classes 1 \ --batch 64 \ --subdivisions 8 \ --no-color-adjustments \ --size 416 960 \ darknet/cfg/yolov4-custom.cfg \ > yolo-obj.cfg
Also customize the provided
obj.data
andobj.names
according to instructions. -
Populate
train.txt
andtest.txt
with paths to input files. Split the input files into the training and testing set randomly. Only include files that have a corresponding bounding box info in a.txt
file.The
generate_train_list.py
script does this:python generate_train_list.py --dir data/
-
Download the pre-trained weights file (162 MB) to the
pretrained
directory.
-
Start training:
darknet detector train obj.data yolo-obj.cfg pretrained/yolov4.conv.137
You can add
-gpus 0,1,2,...
to utilize multiple GPUs.The result of training is a file is called
yolo-obj_best.weights
. -
To run a test detection, edit
yolo-obj.cfg
and uncomment thebatch
andsubdivisions
settings under the# Testing
heading, and comment those under the# Training
heading. Then run:darknet detector test obj.data yolo-obj.cfg yolo-obj_final.weights data/20170701145052891_777000.jpg
To lower the detection threshold, use
-thresh 0.01
.
On WHOI's HPC, build Darknet on a GPU node:
srun -p gpu --pty /bin/bash
Load all necessary modules:
module load cuda10.1/{blas,cudnn,fft,toolkit}
module load cmake/3.20.2 gcc/6.5.0 python3/3.6.5
Create and activate the virtual environment:
python3 -m virtualenv .venv
. .venv/bin/activate
pip install -r requirements.txt
pip uninstall -y opencv-python
Darknet requires OpenCV for performing some image manipulation.
Download OpenCV to opencv/
and extras to opencv/opencv_contrib/
.
mkdir build && cd build
PYTHON_PREFIX="$(python-config --prefix)"
PYTHON_LIBRARY="$PYTHON_PREFIX/lib/lib$(python-config --libs | tr ' ' '\n' | cut -c 3- | grep python).so"
PYTHON_INCLUDE="$(python-config --includes | tr ' ' '\n' | cut -c 3- | head -n 1)"
PYTHON_PACKAGES="$(python3 -c 'import sys; print(sys.path[-1])')"
NUMPY_INCLUDE="$(python3 -c 'import numpy; print(numpy.__path__[0])')/core/include"
mkdir root
cmake .. \
-DCMAKE_BUILD_TYPE=RelWithDebugInfo \
-DCMAKE_INSTALL_PREFIX="$(cd root; pwd)" \
-DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules \
-DWITH_CUDA=ON \
-DWITH_CUBLAS=ON \
-DWITH_CUDNN=ON \
-DCUDNN_LIBRARY="$(pwd)/../../cudnn/lib64/libcudnn.so" \
-DCUDNN_INCLUDE_DIR="$(pwd)/../../cudnn/include" \
-DCUDA_ARCH_BIN=7.0 \
-DOPENCV_DNN_CUDA=ON \
-DBUILD_JAVA=OFF \
-DBUILD_TESTS=OFF \
-DBUILD_PERF_TESTS=OFF \
-DBUILD_opencv_java=OFF \
-DBUILD_opencv_python2=OFF \
-DBUILD_opencv_python3=ON \
-DPYTHON_DEFAULT_EXECUTABLE="$(command -v python3)" \
-DPYTHON3_INCLUDE_DIR="$PYTHON_INCLUDE" \
-DPYTHON3_LIBRARY="$PYTHON_LIBRARY" \
-DPYTHON3_EXECUTABLE="$(command -v python3)" \
-DPYTHON3_NUMPY_INCLUDE_DIRS="$NUMPY_INCLUDE" \
-DPYTHON3_PACKAGES_PATH="$PYTHON_PACKAGES" \
-DOPENCV_SKIP_PYTHON_LOADER=ON \
-DWITH_TIFF=OFF
cmake --build . -j "$(nproc)"
Compiling the CUDA source files takes an unusually long time, so the build may appear to stall.
Finally, we can install the Python module to the virtual environment. Be sure to comment opencv-python
out of the requirements.txt
file as well, so it is not replaced.
cp lib/python3/cv2.*.so "$PYTHON_PACKAGES/"
mkdir build_release && cd build_release
cmake .. \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCUDNN_LIBRARY=/vortexfs1/apps/cudnn-10.1/8.0.2/lib64/libcudnn.so \
-DCUDNN_INCLUDE_DIR=/vortexfs1/apps/cudnn-10.1/8.0.2/include \
-DENABLE_CUDNN_HALF=ON \
-DOpenCV_DIR=$(cd ../../opencv/build; pwd)
cmake --build . -j "$(nproc)"
-
Install OpenCV with
brew install opencv
-
Configure
pkg-config
to be able to find OpenCV:export PKG_CONFIG_PATH=$(brew --prefix opencv)/lib/pkgconfig ln -s opencv4.pc $(brew --prefix opencv)/lib/pkgconfig/opencv.pc
-
Modify the
Makefile
to setOPENCV=1
andAVX=1
. -
Run
make
. -
If you want text labels to appear on the prediction image, copy the
data/labels
directory from the Darknet source directory relative to the path from which you will run thedarknet
command.
-
Salman, A., Siddiqui, S. A., Shafait, F., Mian, A., Shortis, M. R., Khurshid, K., Ulges, A., and Schwanecke, U. Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system, ICES Journal of Marine Science, doi:10.1093/icesjms/fsz025, 2019.
-
Kavasidis, I., Palazzo, S., Di Salvo, R., Giordano, D., and Spampinato, C., An innovative web-based collaborative platform for video annotation, Multimedia Tools and Applications, vol. 70, pp. 413--432, 2013.
-
Kavasidis, I., Palazzo, S., Di Salvo, R, Giordano, D., and Spampinato, C., A semi-automatic tool for detection and tracking ground truth generation in videos, Proceedings of the 1st International Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications, pp. 6:1--6:5, 2012.
-
Cutter, G.; Stierhoff, K.; Zeng, J. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild, IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 57--62, 2015.