Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possibility to track multi-person in the same scene #54

Open
iPsych opened this issue Mar 30, 2021 · 1 comment
Open

A possibility to track multi-person in the same scene #54

iPsych opened this issue Mar 30, 2021 · 1 comment
Assignees
Labels
feature A Requested feature question Further information is requested

Comments

@iPsych
Copy link

iPsych commented Mar 30, 2021

Hello,
The code works amazingly for shuffle.webm and other single person stimuli, but works very strangely when I put the multi-person video.
Is there any way to expand MocapNET with multi-person one like
https://paperswithcode.com/task/multi-person-pose-estimation?

@AmmarkoV AmmarkoV self-assigned this Mar 31, 2021
@AmmarkoV AmmarkoV added feature A Requested feature question Further information is requested labels Mar 31, 2021
@AmmarkoV
Copy link
Collaborator

AmmarkoV commented Mar 31, 2021

The weird behavior you are refering to arises from the 2D joint heatmap detection
https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/src/JointEstimator2D/jointEstimator2D.cpp#L288
where the code tries to "retrieve" the joints with the strongest heatmap signatures..

If you get multiple persons in a scene the algorithm will try to "connect" parts of the bodies of different persons ( the parts with the highest score ) resulting in incorrect results..

In the older version of MocapNET ( MNET1 ) there used to be a mode ( https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/mnet1/src/MocapNET1/MocapNETLiveWebcamDemo/mocapNETLiveDemo.cpp#L838 )
where by giving the ./MocapNETLiveWebcamDemo --rectangle X Y WIDTH HEIGHT you could actually erase a part of the image so this part of the image will get ignored, this however was a silly workaround and in the next version it got removed..

//Some datasets have persons that appear in parts of the image, we might want to cover them using a rectangle //We do this before adding any borders or otherwise change of the ROI of the image, however we do this //after possible frame skips for the obviously increased performance.. if (coveringRectangle) { cv::Point pt1(coveringRectangleX,coveringRectangleY); cv::Point pt2(coveringRectangleX+coveringRectangleWidth,coveringRectangleY+coveringRectangleHeight); cv::rectangle(frame,pt1,pt2,cv::Scalar(0,0,0),-1,8,0); }

If you think you will find this useful then I could reinstate it..

That being said the second thing one can do is use OpenPose with the -number_people_max 1 flag, this way OpenPose will just pick one skeleton and solve the issue. OpenPose uses PAFs that allow joints to be connected on the same person, and has provisions to correctly seperate persons in a scene
https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/scripts/processDatasetWithOpenpose.sh#L23

A proper solution for the live webcam demo would be to incorporate a neural network like Darknet/YOLO ( https://github.com/AlexeyAB/darknet ) run this first on the incoming OpenCV frame, retrieve the persons on the image ( as seen here https://www.youtube.com/watch?v=saDipJR14Lc#t=23m ) and then run the MocapNET pipeline on each of the retrieved rectangles ..

This will work, it will also degrade framerate linearly with more persons present in the scene ( since the Neural Network will have to be executed once for each one of them ), then you will also have the additional problem of person reidentification so that you have multiple BVH file outputs and keep track of which skeleton belongs to which BVH file and update them correctly ..

So that being said adding all this complexity on the project is overkill and it doesnt have a lot of novelty or research interest so that is why it has been skipped..!

I think at this point the best thing to be done is masking parts of the scene you dont want in an attempt to workaround, ( or just use OpenPose as the 2D engine )

Hope I did a good job explaining the issue,
Looking forward to your input

Ammar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A Requested feature question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants