Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whether endIndices algorithm in BoxParser should be improved a little when ctts box is available? #1797

Closed
ybai001 opened this issue Oct 10, 2024 · 2 comments
Assignees
Labels

Comments

@ybai001
Copy link
Contributor

ybai001 commented Oct 10, 2024

The question is about the endIndices calculation algorithm in BoxParser , the corresponding code is here: https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L723

The potential issue is several video frames may be dropped in video editing/transcoding use case.

Let me give you an example.

The test stream is a video with ctts box. It contains 2034 frames. I set a breakpoint at https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L729. And then I get below debug info.

editMediaTime = 143
editDuration = 3049002
so editMediaTime + editDuration = 3049145 (corresponding to line 726)
startIndices[0] = 0 (This is correct)
endIndices[0] = 2032
timestamps =
| index | 0 | 1 | 2 | 3 | 4 | 5 | ...... | 2031 | 2032 | 2033 |
| value | 1509 | 4508 | 3008 | 6008 | 7499 | 9008 | ...... | 3045972 | 3050471 | 3048971 |

Here "endIndices[0] = 2032" because the frame (index = 2032) is a P frame so that its timestamp is LARGER than the one of frame (B frame, index = 2033). But actually frame (index = 2033) should be kept since its timestamp is less than editMediaTime + editDuration.

As the comment in this file (https://github.com/androidx/media/blob/release-1.5.0-alpha01/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/BoxParser.java#L713), I understand the author knows "the result of this binary search might be slightly incorrect (due to out-of-order timestamps), the loop below that walks forward to find the next sync frame will result in a correct start index. The start index would also be correct if we walk backwards to the previous sync frame". This is OK for playback use case. But AndroidX Media added transformer module recently. For editing and transcoding use case, the accurate END index becomes important.

Do you think this is a potential issue and could you help to improve it? Thanks.

@rohitjoins rohitjoins self-assigned this Oct 11, 2024
copybara-service bot pushed a commit that referenced this issue Oct 15, 2024
Updated logic to walk forward in the timestamps array to include all frames within the valid edit duration, accounting for out-of-order frames. This ensures that no frames with timestamps less than `editMediaTime` + `editDuration` are incorrectly excluded.

Issue: #1797
PiperOrigin-RevId: 686075680
@rohitjoins
Copy link
Contributor

Hi @ybai001,

Thank you for raising this issue. We have improved the endIndices calculation to include timestamps which follows the index and is less than equal to editMediaTime + editDuration.

Please feel free to let us know in case you think it requires further improvement.

@ybai001
Copy link
Contributor Author

ybai001 commented Oct 16, 2024

@rohitjoins , thanks for the fixing. I tested it on my side and it works. I'll let you know if there is any other issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants