-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Yolo-LSTM (~+4-9 AP) for detection on Video with high mAP and without blinking issues #3114
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Comparison of different models on a very small custom dataset - 250 training and 250 validation images from video: https://drive.google.com/open?id=1QzXSCkl9wqr73GHFLIdJ2IIRMgP1OnXG Validation video: https://drive.google.com/open?id=1rdxV1hYSQs6MNxBSIO9dNkAiBvb07aun Ideas are based on:
There are implemented:
|
Great work! Thank you very much for sharing this result. LSTM indeed improves results. I wonder have you evaluated the inference time with LSTM as well? Thanks |
How to train LSTM networks:
If you encounter CUDA Out of memeory error, then reduce the value The only conditions - the frames from the video must go sequentially in the
Or you can use, for example:
|
@i-chaochen I added the inference time to the table. When I improve the inference time for LSTM-networks, I will change them. |
Thanks for updates! |
@i-chaochen This is a millisecond, I fixed ) |
Interesting, it seems yolo_v3_spp_lstm has less BFLOPs(102) than yolo_v3_spp.cfg.txt (112), but it still slower... |
@i-chaochen Lines 866 to 869 in b9ea49a
to the one fast function add_3_arrays_activate(float *a1, float *a2, float *a3, size_t size, ACTIVATION a, float *dst);
|
Hi @AlexeyAB Could you please advice me on this |
@NickiBD For these models you must use the latest version of this repository: https://github.com/AlexeyAB/darknet |
Thanks alot for the help .I will update my repository . |
@AlexeyAB hi, how did you run yolov3-tiny on the Pixel smart phone, could you give some tips? thanks very much. |
Hi @AlexeyAB, |
@NickiBD Hi, Which repository and which script do you use for this conversion? |
Hi @AlexeyAB, Many thanks . |
Hey @AlexeyAB could you help me use the lstm cfg's properly? Currently, regular yolov3 does much better on a custom dataset. Files are in sequential order in the training file and for some of the videos there are 200 frames and some others 900 frames. The file the mAP is calculated on has videos with 900 frames. Yolov3: Yolov3-tiny-pan-lstm: yolo_v3_tiny_pan_lstm.cfg.txt I don't have the graph for the following |
@AlexeyAB any idea on how to improve the performance of the issue mentioned above? |
Any plan to add lstm to yolov4? Thanks, |
I don't think it's necessary, because lstm or conv-lstm is designed for the video scenario, especially there is a sequence-to-sequence "connection" between frames, and the yolo-v4 should be a general model for the image object detection, like ms-coco or imagenet benchmark. You can add this into your model if your yolo-v4 is used in the video. |
I am processing traffic scenes from a stationary camera, so I think lstm could be helpful. How do I actually add it to yolo-v4? |
Is there a way to train an lstm layer on top of an already trained network? |
the purpose of LSTM is to "memorize" some features between frames, if you add it at the very top/beginning of the trained cnn network, where hasn't learned anything yet, LSTM wouldn't learn or memorize any thing. This paper mentioned some insights about where to put the LSTM to get the optimal result. Basically, it's should be after the 13-Conv. |
@i-chaochen I think the more complex the recurret layer, the later we should add it. In this case maybe we should create a workaround for CRNN
|
Is memory consumption increasing every time and eventually leads to a lack of memory? |
Speaking of memory consumption, maybe you can have a look on gradient check pointing. It can save significantly memory for the training. |
@AlexeyAB |
@smallerhand It is in progress. |
@AlexeyAB |
@AlexeyAB, hello. What are the blinking issues? Does it mean that objects can be detected in this frame, but not in next one? |
Hi Alexey, I really appreciate your work and improvements from previous Pjreddie repo. I had a Yolov3 people detector trained on custom dataset videos using single frames, now i want to test your new model Yolov4 and conv-lstm layers. I trained the model with yolov4-custom.cfg and results improved just by doing this, I am now wondering how to add temporal information (i.e. conv-lstm layers). |
@smallerhand have you done a comparison between yolo_v3_spp_lstm.cfg and yolov4? What are the results? @HaolyShiit Blinking issues can either mean:
@fabiozappo not yet possible to add lstm to YoloV4, Alexey is actively working on it. |
TO ALL PEOPLE REDING THIS PAGE, in order to try those LSTM models, you have to use "Yolo v3 optimal" repo |
@arnaud-nt2i |
If you're interested in fixing the conv_lstm module the issue is in conv_lstm_layer.c with the line 1457: Lines 1450 to 1458 in b4d03f8
It should check for l.dh_gpu: This solves cuda errors but can cause NAN during training. To avoid this, I commented it out completely. I trained the small self driving dataset with the some cfg's you provided above and got these results. yolov3-tiny-pan_lstm.cfg.txt yolov3-tiny-pan.cfg.txt yolov3-tiny-pan_lstm_noBottleNeck.cfg.txt yolov4-tiny_smallSelfDriving.cfg.txt |
@AdamCuellar Thanks! Could you add PR with commented |
@AlexeyAB yep done! |
Implement Yolo-LSTM detection network that will be trained on Video-frames for mAP increasing and solve blinking issues.
Think about - can we use Transformer (Vaswani et al., 2017) / GPT2 / BERT for frame-sequences instead of word-sequences https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf and https://vk.com/away.php?to=https%3A%2F%2Farxiv.org%2Fpdf%2F1706.03762.pdf&cc_key=
Or can we use Transformer-XL https://arxiv.org/abs/1901.02860v2 or UNIVERSAL TRANSFORMERS https://arxiv.org/abs/1807.03819v3 for Long-time sequences?
The text was updated successfully, but these errors were encountered: