Skip to content

Detection of violence in videos using spatio-temporal feature extraction with CNN and LSTM

Notifications You must be signed in to change notification settings

11-aryan/Violence-Detection-Using-ConvLSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Violence-Detection-Using-ConvLSTM

This project proposes a Deep Neural Network based method to recognize the violence in videos. CNN is used to extract the features from frame level in videos. Then these features are accumulated using a variant of LSTM that uses convolutional gates. The combination of CNN and LSTM can take the localized features that enable local motion analysis taking place in the videos.

Requirements:

tensorflow 2.0 
numpy 1.19.5
opencv 3.4.2
keras
scikit-image
os
PIL
time

Overview

This project uses an end-to-end trainable deep neural network model for classifying videos in to violent and non-violent ones. The block diagram of the proposed model is illustrated in figure. The network consists of a series of convolutional layers followed by max pooling operations for extracting discriminant features and convolutional long short memory (convLSTM) for encoding the frame level changes that characterizes violent scenes, existing in the video.

The code to train the model can be found in the Violence_Detection.ipynb notebook, and the code for testing videos can be found in Test_Video.ipynb notebook

2021-03-13 (2)

Video Preprocessing for Training

20 frames are extracted from the video and fed to the VGG16 pretrained model to extract the spatial features from the video.

VGG16 Pretrained model

The VGG16 model contains a convolutional part and a fully-connected (or dense) part which is used for classification. Here the whole model is used for training, but the last layer of the model which is a fully connected layer used for classification is not used, insted the transfer values are saved before the last layer

Caching the transfer values

The images take a long time to get processed with the VGG16 model, by caching the transfer values, a lot of time can be saved.

Classification

When all the videos have been processed through the VGG16 model and the resulting transfer-values saved to a cache file, then we can use those transfer-values as the input to LSTM neural network. We will then train the second neural network using the classes from the violence dataset (Violence, No-Violence), so the network learns how to classify images based on the transfer-values from the VGG16 model.

References

 Learning to Detect Violent Videos using Convolutional Long Short-Term Memory - Swathikiran Sudhakaran and Oswald Lanz, University of Trento.

 A Review on State-of-the-Art Violence Detection Techniques – IEEE https://ieeexplore.ieee.org/abstract/document/8782115

About

Detection of violence in videos using spatio-temporal feature extraction with CNN and LSTM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published