Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between loaded features and snippet indexes. #4

Closed
Phoenix1327 opened this issue Apr 27, 2020 · 4 comments · Fixed by #5
Closed

Mismatch between loaded features and snippet indexes. #4

Phoenix1327 opened this issue Apr 27, 2020 · 4 comments · Fixed by #5
Assignees
Labels
bug Something isn't working

Comments

@Phoenix1327
Copy link

Thanks for releasing code, but I found there may exist some bugs when loading features from the h5 file.
In line 208 and 209 of dataset.py, we can see that the features are loaded every 5 frames (self.video_skipframes=5 for thumos)
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L208
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L209
The start frame of the loaded sequence should be 0 (idx=0).

But in line 221, the snippet index starts from #start_snippet=3#.
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L221

Then, after calculating, we can find the anchor region related the first timestamp in the sequence will be [0.5, 5.5].
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L241
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L242
But when you calculate the start region and the end region related to the ground truth box, these seems no such shift along the temporal dimension:

https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L137
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L138

@frostinassiky frostinassiky added the bug Something isn't working label Apr 27, 2020
@frostinassiky frostinassiky self-assigned this Apr 27, 2020
@Phoenix1327
Copy link
Author

Phoenix1327 commented Apr 27, 2020

Maybe line 104 and 105 give the correct anchor_xmin and anchor_xmax (the measurements are seconds here), but they are not utilized to calculate training labels.
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L104
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L105
In BSN's codes, there exists a #start_idx#, I guess the reason is the extracted features used by BSN are already sampled at interval of 5 frames and the selected frame starts at the 3-th frame.

@frostinassiky
Copy link
Owner

Hey @Phoenix1327 Thanks for pointing this out!

When I load the features, I only used the first frame to represent the 5-frame-segment.
It is not the optimal solution because, as you mentioned, the 3rd frame should be more representative, or the average.

From my personal experience, the feature are very similar as their temporal neighbours. The improvement might be marginal but still worth discussing!

I would like to apply your suggestion to load the third frames. Let's keep this issue open and update the new experiment here.

@Phoenix1327
Copy link
Author

Phoenix1327 commented Apr 27, 2020

Hey @Phoenix1327 Thanks for pointing this out!

When I load the features, I only used the first frame to represent the 5-frame-segment.
It is not the optimal solution because, as you mentioned, the 3rd frame should be more representative, or the average.

From my personal experience, the feature are very similar as their temporal neighbours. The improvement might be marginal but still worth discussing!

I would like to apply your suggestion to load the third frames. Let's keep this issue open and update the new experiment here.

Sorry, I may not put across my idea properly.
In fact, I think using the first frame is good. The sampled frames will be like: [0, 5, 10, ...,].
However, the problem is the indexes of the sampled frames are uncorrected in line 221. The indexes in line 221 are [0+3, 5+3, 10+3, ..., ].
I suggest to change line 221 to:
df_snippet = [skip_videoframes * i for i in range(num_snippet)]

If not, when you calculate match scores for the timestamp t (t in [0, 5, 10, ...]) as in line 144:
https://github.com/Frostinassiky/gtad/blob/f4677a2fd8fda0f990e0c05687b07eed24de5688/gtad_lib/dataset.py#L144
If I don't make miscalculations, the anchor region for timestamp t in line 144 is [t+3-2.5, t+3+2.5]. You match this region to the ground truth start/end region. It will lead to the mismatch. I think the correct region for timestamp t is [t-2.5, t+2.5].

Or, you can load the third frames. And simply change line 208 and 209 to self.flow_val[video_name][start_snippet:-1:self.skip_videoframes,...],
self.rgb_val[video_name][start_snippet:-1:self.skip_videoframes,...]

@frostinassiky frostinassiky linked a pull request Apr 29, 2020 that will close this issue
@frostinassiky
Copy link
Owner

frostinassiky commented Apr 29, 2020

Hey @Phoenix1327 ,
Thanks again for your constructive feedback!
I already update the code based on your suggestion, and the model performance, mAP at tIoU 0.5, increases from 0.427 to 0.430.
Please feel free to re-open the issue if there are mismatching problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants