Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make filter_graph_->nb_threads be 1, and make number of decoding threads customizable #63

Merged
merged 3 commits into from
May 7, 2020

Conversation

innerlee
Copy link
Contributor

@innerlee innerlee commented May 6, 2020

The makes the time spent on this function deterministic, and hence faster.
Testing on 32 videos shows 1.3x speedup:
image

Testing code:
image

@innerlee innerlee mentioned this pull request May 6, 2020
@huww98
Copy link

huww98 commented May 6, 2020

We also work on Kinetics dataset and found an issue similar to #62. I suggest adding an option to totally disable FFmpeg multi-threading. My patch is similar, so I will not open a new PR. see huww98@76333af

In our use case, we see not only 1.24x speedup, but CPU usage also drops from 245% to 110%. That's a total of 2.7x more efficient.

Here is my test code. In short, I open each file, then only read first 16 frames.

from pathlib import Path
import time
from decord import VideoReader

BASE = Path('/tmp/answering_questions')

def main():
    files = list(BASE.glob('**/*.mp4'))
    frames = list(range(16))

    print(f"Reading {len(frames)} frames from {len(files)} files")
    for i in range(3):
        print(f'pass {i + 1}')
        t1 = time.perf_counter()
        for f in files:
            v = VideoReader(str(f))
            v.get_batch(frames)
        t2 = time.perf_counter()
        print(f'Time: {t2-t1}')

if __name__ == "__main__":
    main()

The CPU usage data is reported by /usr/bin/time -v python test.py

@zhreshold
Copy link
Member

Good catch! However, I think a better option is to expose the multithreading config to python side with arguments to turn ON/OFF the multithreading.
@innerlee Do you want to add that option to VideoReader?

@yjxiong @bryanyzhu FYI, there's possibly a good margin for speed up in multi-worker loading if we turn of the internal mt.

@innerlee
Copy link
Contributor Author

innerlee commented May 7, 2020

@huww98 I tested your changes. If we

Clearly in the single worker case, we should leave the decoding thread num to auto.
Note the above test is not in parallel. I'm not sure if disabling this would lead to performance gain in multi-worker loading. More tests should be done.

@zhreshold Yeah its nice to implement that option. I'm not familiar with c++, so if this change is trivial for you, could you help implementing it? I would like to continue tracing other performance issues.

Signed-off-by: lizz <lizz@sensetime.com>
@innerlee
Copy link
Contributor Author

innerlee commented May 7, 2020

@zhreshold I implemented the customizable number of decoding thread. And leave the filter graph thread to the fixed number 1.

@innerlee innerlee changed the title Make filter_graph_->nb_threads be 1 Make filter_graph_->nb_threads be 1, and make number of decoding threads customizable May 7, 2020
Signed-off-by: lizz <lizz@sensetime.com>
@huww98
Copy link

huww98 commented May 7, 2020

@innerlee Thanks. My test is performed in WSL2, maybe threading overhead is larger in WSL.

However, in our use case, we do not care about single video decode time. Since we always use multiple workers to increase throughput, I think multi-thread in FFmpeg can only add more overhead. Disabling FFmpeg threading can always benefit us.

@innerlee
Copy link
Contributor Author

innerlee commented May 7, 2020

yes, with this pr, you can set num_threads=1 in the python constructor of VideoReader. Could you try it and check if the speed is similar to your changes?

@yjxiong
Copy link

yjxiong commented May 7, 2020

Late to the party. Great to know this is being figured out.

The filter-graph threading seems not needed in the use case. The codec threading seems helpful when the use case just has one worker. For multiple workers, it is better to just use no ffmpeg-level threading.

Maybe we can add this guidance in the usage doc.

Copy link
Member

@zhreshold zhreshold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@zhreshold
Copy link
Member

I will be merging this now, once we figure out the impact of multi_worker we can add guidance in the README.

@zhreshold zhreshold merged commit 2daa715 into dmlc:master May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants