Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

How to perform captioning on action proposals? #20

Closed
sgarbanti opened this issue Mar 2, 2020 · 24 comments
Closed

How to perform captioning on action proposals? #20

sgarbanti opened this issue Mar 2, 2020 · 24 comments

Comments

@sgarbanti
Copy link

Hi, thank you for sharing this repository.

My goal is to use your model to generate captions on Activity-Net action proposals.

The dataset is the same, so i don't think to need to retrain the model, however i should need to generates the region features and detections using Detectron. Right?
Is there an easy way, a script, to do it?

I'd saw that you kindly provide the code here:
https://github.com/LuoweiZhou/detectron-vlp

Should I download "RGB frames extracted at 5FPS" provided by ActivityNet, segment them by mine action proposals timestamp, uniformly sample 10 frames for each segment and than use the extract_features.py script, that is into your detectron-vlp repository, to extract region features?

Thanks in advance.

@LuoweiZhou
Copy link
Contributor

@sgarbanti You should directly run ffmpeg on videos to extract frames. See here: https://github.com/facebookresearch/ActivityNet-Entities#faqs

@sgarbanti
Copy link
Author

sgarbanti commented Mar 2, 2020

Thank you @LuoweiZhou for your help.

Once that I dowloaded video and extracted the 10 sampled frames for each action proposals, I need to use Detectron using extract_features.py script, right?

In this way I would obtain only the region features or also the file in h5 format with the region proposals?

@LuoweiZhou
Copy link
Contributor

@sgarbanti There are three parts: proposals (coordinates), features, and class probabilities. You need to modify the script to take in video input (this part will be updated later this month) and revise proc_split according to your video ids.

@sgarbanti
Copy link
Author

sgarbanti commented Mar 6, 2020

@LuoweiZhou thank you very much for your patience and for your work too.
Following your instructions I tried to reproduce the region-features of the segment 0 in the video v_D18b2IZpxk0, that has timestamps: 0, 17.7 ; but the features that I obtained are different respect than yours contained in the file v_D18b2IZpxk0_segment_00.npy ( i looked also the segments 01 and 02 but mine obtained features doesn't match with any of them) .

I downloaded the video with https://github.com/activitynet/ActivityNet/tree/master/Crawler, then i extracted the 10 uniformly sampled frames, using segment's timestamps, as you shown here: facebookresearch/ActivityNet-Entities#1 (comment)

Finally, using the configuration and checkpoint files for GVD, I extracted region-features with your code https://github.com/LuoweiZhou/detectron-vlp ; using exract_features.py on my own segment frames.

Is there anything wrong with what i did?

@LuoweiZhou
Copy link
Contributor

LuoweiZhou commented Apr 2, 2020

@sgarbanti Sorry for the delay. For some reason, I missed your follow-up. We double-checked the code and it turned out there are two differences might account for the discrepancy. In GVD, we are using the box coordinates from RPN rather than the final coordinates after class-wise regression (you can compare them here). I have added the corresponding script tools/extract_features_gvd_anet.py. Besides, GVD only uses float32 to storage feature while VLP has float16 or float32. More details are in these two commits (c1, c2).

@sgarbanti
Copy link
Author

@LuoweiZhou Thank you for the new script.
I tried it on the first 5 test segments in dic_anet.json:
v_yACg55C3IlM_segment_00, v_yACg55C3IlM_segment_01, v_yACg55C3IlM_segment_02, v_yACg55C3IlM_segment_03, v_ng14GLT_hHQ_segment_00
which are respectively located at indices 37, 38, 39, 40, 57 in your h5 file.

But unfortunately, the results that I obtained don't coincide with yours data: neither with the information of the regions nor with the features of the regions.
I also tried changing the way I extract frames but without success, I can't reproduce your data.

Is it possible that the weights in the detectron-vlp repository for the gvd model are not exactly the same that you used?

@LuoweiZhou
Copy link
Contributor

@sgarbanti I actually have double-checked recently and can reproduce the features (despite slight discrepancies in value at magnitude 1e-5 or less, possible due to device-related differences in processing floating-point). We will work together to debug this. Just to confirm, you have read the updated README and are using the correct scripts, right? Also, I'd suggest you delete the yaml/pkl file from your local and re-download them using the links we provided just to be sure.

@sgarbanti
Copy link
Author

@LuoweiZhou thank you for your time, I had read the README file and I had already tried to download the checkpoint and configuration files again, but this didn't change anything.

I used the "extract_features_gvd_anet.py" script, I just had to modify some small things like the use of the --list_of_ids parameter, which was not used in the code, where I passed the dic_anet.json file to it since the segments in your file .h5 are sorted by this file (I saw that looking at the dataloader).
Finally I also corrected the structure of the outputs in the .h5 file, to make them the same as the original.
However I have not changed anything that could change the computed values and in any case I have not touched in any way the region features.

You can see the modified script, that i used, here:
https://drive.google.com/file/d/1dL1rF1VzxXuE0AAU5lzXYp1MgbssIbx2/view?usp=sharing

Thank you very much.

@LuoweiZhou
Copy link
Contributor

@sgarbanti What I can do is to see if your feature files are correct since the changes you made will have no impact on the *.npy files. Could you share with me your following files:
v_kmWf36zfL7o_segment_00.npy
v_QsfIM28uvHM_segment_02.npy
v_G8gTBLLf8Bo_segment_00.npy
and I will compare with mine. Besides, can you run md5sum e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.* and post your output here? Then I can confirm if we're using the same checkpoint.

@sgarbanti
Copy link
Author

@LuoweiZhou I was able to extract the features only for segments v_QsfIM28uvHM_segment_02.npy and v_G8gTBLLf8Bo_segment_00.npy, because the video with ID kmWf36zfL7o is no longer available.
I also downloaded again "extract_feat_gvd_anet.sh" and "extract_features_gvd_anet.py" from the detectron-vlp repository and, without changing anything, not even the paths, I extracted the features by running "extract_feat_gvd_anet.sh" and the obtained region features are same as those extracted by the modified script.
Anyway, results of the modified script are here:
https://drive.google.com/drive/folders/1OOP9A8b0M4WC_tIqaaWIvtBGmNl3dA5a?usp=sharing

the "command.txt" file contains bash commands used to extract frames, following your instructions: facebookresearch/ActivityNet-Entities#1 (comment). Region features are in the "fc6_feat_100rois" folder.

The output of md5sum e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.* :
e759feea6c88afbada11e5c21af75d78 e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.pkl
8ed0b91defffa99efde997f8a940f5cb e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.yaml

Thanks for your help.

@LuoweiZhou
Copy link
Contributor

@sgarbanti The md5sum ids look good. Could you also place the sampled frames in the Gdrive folder? Thanks

@sgarbanti
Copy link
Author

sgarbanti commented Apr 7, 2020

@LuoweiZhou Ok, I added a folder "Frames" with the sampled frames.

EDIT: I performed the evaluation only on the four segments of the v_yACg55C3IlM video, initially with your region features (file h5 included) and then with region features extracted by me (file h5 included) and the captions generated seem good and the scores have even improved.
This could mean that even if I can't replicate your features, the features that I can extract are still good for the model.

@LuoweiZhou
Copy link
Contributor

@sgarbanti I just checked the frames and features and it turned out the frames look the same as mine but the features are way off. I cannot really imagine why at this point but will keep diagnosing later today. In the meantime, you may want to go through some of the commits to see if you have missed anything. BTW, are you needing the features for new videos or?

@sgarbanti
Copy link
Author

@LuoweiZhou I checked and the detectron-vlp repository that I have should be consistent, I also tried to replace the "convert_cityscapes_to_coco.py" script in the detectron repo with the one before the last commit of January 15th, but nothing changes.

I'm using the same videos but I need to extract features for new segments, I'm using a temporal action proposal generator to get the event segments and I need to extract the features to caption them with gvd.

@LuoweiZhou
Copy link
Contributor

@sgarbanti To eliminate the possibility that I made any unintended changes to the code, I made a copy of mine here: https://drive.google.com/file/d/1Bt7GXTV6P0pC33bEGPpHMq-Y77ZJDzh1/view?usp=sharing
Just for your info, my ffmpeg version is 2.8.15-0ubuntu0.16.04.1, my caffe2 version is as of Feb. 22 2019, and I'm using cuda/8.0 cudnn/8.0-v7.0.5. Though I don't think these will make any difference.

@sgarbanti
Copy link
Author

@LuoweiZhou I tried to re-extract the features several times, I also tried to reinstall and reconfigure everything, including conda environment and detectron, but I always obtain same results.
I also tried with your code but nothing changes.
Frames should be ok since my ffmpeg is the same as yours, but there is something that, maybe, could influence the features extraction which is that both in mine and in your code I had to replace the file "detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so" with the one in Caffe2: "miniconda3/envs/gvd_pytorch1.1/lib/python2.7/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so"

as you suggested here:
LuoweiZhou/detectron-vlp#1 (comment)

because otherwise I'd get the error:
OSError: /home/<my_user>/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so: undefined symbol: ZN6caffe24math3SetIiNS_11CUDAContextEEEviT_PS3_PT0

Do you think this could be the problem?

@LuoweiZhou
Copy link
Contributor

@sgarbanti Thanks for the feedback. I highly suspect this results from the discrepancy in our caffe2 package. In your case, you're using the caffe2 from torch while I compile a stand-alone conda env for caffe2 (before caffe2 was merged into pytorch, let's name the env c2).

I tried to reproduce your output but encountered some problems when trying to convert gvd_pytorch1.1 -> c2 to run detectron-vlp. What you can help here is following the official Detectron instruction (only for inference until here) to create your c2 env. Then, again copy your compiled libcaffe2_detectron_ops_gpu.so from c2 over to your detectron-vlp/lib. You should verify the following when you run md5sum libcaffe2_detectron_ops_gpu.so to see if you get the same *.so as I did.
9bb6059417cfc6ae73bbdf573ea63ad1 libcaffe2_detectron_ops_gpu.so

@sgarbanti
Copy link
Author

@LuoweiZhou I followed the Detectron instructions starting from the gvd_pytorch1.1 environment, my md5 checksum for libcaffe2_detectron_ops_gpu.so is:
c3550163798adc51f41d3445f25abaeb libcaffe2_detectron_ops_gpu.so

I get your checksum if I compute it on libcaffe2_detectron_ops_gpu.so from the detectron-vlp repository, but I have to replace it to run detectron-vlp.

@LuoweiZhou
Copy link
Contributor

LuoweiZhou commented Apr 14, 2020

@sgarbanti I meant a separate conda env c2 specially for caffe2 and detectron (do not go from gvd_pytorch1.1). I have attached the env config file c2.yml.

@sgarbanti
Copy link
Author

@LuoweiZhou I tried to initialize an environment with python 2.7 and then follow the Detectron istructions but when I run make install in the cococapi repository I get a syntax error:
File "/tmp/easy_install-DRZCxy/matplotlib-3.2.1/setup.py", line 139
raise IOError(f"Failed to download jquery-ui. Please download "
^
SyntaxError: invalid syntax
Makefile:7: recipe for target 'install' failed
make: *** [install] Error 1

Trying to create an environment with your yml file, conda doesn't find your version of Caffe2:
ResolvePackageNotFound:

  • pytorch-nightly==1.0.0.dev20190222=py2.7_cuda10.0.130_cudnn7.4.2_0

If I use your yml file, postponing the installation of Caffe2, I get this version:

  • pytorch-nightly==1.0.0.dev20190328=py2.7_cuda10.0.130_cudnn7.4.2_0

But then, when I try to run detectron-vlp, I get:
Traceback (most recent call last):
File "tools/extract_features_gvd_anet.py", line 53, in
c2_utils.import_detectron_ops()
File "/home/sgarbanti/ProgettoTesi/detectron-vlp/lib/utils/c2.py", line 41, in import_detectron_ops
detectron_ops_lib = envu.get_detectron_ops_lib()
File "/home/sgarbanti/ProgettoTesi/detectron-vlp/lib/utils/env.py", line 71, in get_detectron_ops_lib
('Detectron ops lib not found; make sure that your Caffe2 '
AssertionError: Detectron ops lib not found; make sure that your Caffe2 version includes Detectron module

I can only get everything working starting from gvd_pytorch1.1, I don't know why I get all these errors.

@LuoweiZhou
Copy link
Contributor

You need to google around to fix the bugs on installation (e.g., cococapi). A stand-alone c2 conda env with the correct libcaffe2_detectron_ops_gpu.so seems to be the key in reproducibility.

@sgarbanti
Copy link
Author

@LuoweiZhou I managed to create a stand-alone conda environment but libcaffe2_detectron_ops_gpu.so in the detectron-vlp repository, the one with your checksum, still produces this error:
OSError: /home/sgarbanti/ProgettoTesi/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so: undefined symbol: ZN6caffe24math3SetIiNS_11CUDAContextEEEviT_PS3_PT0

Traceback: https://pastebin.com/raw/1BDZ17DZ

So I have to replace it with the one in the pytorch installation, thus getting my region features.
I tried with different versions of cudatoolkit, cudnn and pytorch but nothing changes.

@LuoweiZhou
Copy link
Contributor

@sgarbanti I've reproduced the feature by directly using the caffe2 from gvd_pytorch1.1. I now suspect other reasons (esp. frames). Pls email me at luozhou@umich.edu if you want to discuss further.

@sgarbanti
Copy link
Author

@LuoweiZhou I don't know, I downloaded the videos and extracted frames in the way I said here: #20 (comment)

However it seems that my extracted features are also good for the model, so no problem.
Thank you for your patience and for your kindness.

I'm closing the issue, best regards.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants