Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoReader bug when decoding last frames on CUDA #135

Closed
hexfaker opened this issue Feb 19, 2021 · 5 comments
Closed

VideoReader bug when decoding last frames on CUDA #135

hexfaker opened this issue Feb 19, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@hexfaker
Copy link

Hello.

I found potentially buggy decoding behavior on CUDA. It seems that when calling VideoReader.get_batch() with indices from the end of the video multiple times sometimes decord fails to do it properly. I created a gist with dockerfile and minimal code to reproduce the issue (test_video_get_batch_last_frames_gpu function). It fails not every time, but in 90% cases on one of the machines I tested and in 50% cases on another.
Both have nvidia driver 455 and 1080Ti GPU. I also tested other older versions (up to 0.4.0) and cuda 11.1, and it reproduces everywhere.

Also, I accidentally found that one of your tests is also failing in the setup I provided, so I put it in my gist too (test_bytes_io function)

@hexfaker
Copy link
Author

Oh, sorry. I forgot to add text of the error. The error in test_video_get_batch_last_frames_gpu looks like

decord._ffi.base.DECORDError: [13:43:35] /decord/src/video/video_reader.cc:438: [/decord/examples/flipping_a_pancake.mkv]Unable to handle EOF, exit...

It's also can be helpful that decord version=0.4.0 to 0.4.2 just freezes in same test.

The error in test_bytes_io looks like

File "/decord/tests/python/unittests/failing_test.py", line 25, in test_bytes_io
    assert np.allclose(vr[10].asnumpy(), vr2[10].asnumpy())
AssertionError
```.

@ahkarami
Copy link

Dear @innerlee & @zhreshold,
I also have faced the same issue when using decord==0.5.2 (cpu based, which I have installed it via pip). Would you please address this issue?
Best

@zhreshold
Copy link
Member

@ahkarami sorry I can't reproduce the error on cpu.

@hexfaker I tried the gist but it fails to compile due to missing libnvcuvid.so, I noticed that you have symlinked the file, but the build error persists for some reason. However, I am able to reproduce the error locally:

  • For the failed test case, I've checked the output from cpu and gpu, the decoded frames have marginal pixel difference and I rekon it is normal, with assertion changed from assert np.allclose(vr[10].asnumpy(), vr2[10].asnumpy()) to assert np.mean(np.abs(vr[10].asnumpy().astype('float') - vr2[10].asnumpy().astype('float'))) < 2 # average pixel diff < 2
platform linux -- Python 3.7.9, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/ubuntu/decord/tests/python/unittests
collected 9 items

test_video_reader.py .........
============================================================= warnings summary ==============================================================
test_video_reader.py::test_video_corrupted_get_batch
  /home/ubuntu/anaconda3/envs/debug-decord/lib/python3.7/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================================================= 9 passed, 1 warning in 1.36s ========================================================
  • I do notice an intermittent error
================================================================= FAILURES ==================================================================
_______________________________________________________ test_video_reader_read_random _______________________________________________________

    def test_video_reader_read_random():
        vr = _get_default_test_video()
        lst = list(range(len(vr)))
        random.shuffle(lst)
        num = min(len(lst), 10)
        rand_lst = lst[:num]
        for i in rand_lst:
>           frame = vr[i]

test_video_reader.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../python/decord/video_reader.py:102: in __getitem__
    return self.next()
../../../python/decord/video_reader.py:114: in next
    arr = _CAPI_VideoReaderNextFrame(self._handle)
../../../python/decord/_ffi/_ctypes/function.py:175: in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ret = -1

    def check_call(ret):
        """Check the return value of C API call

        This function will raise exception when error occurs.
        Wrap every API call with this function

        Parameters
        ----------
        ret : int
            return value from API calls
        """
        if ret != 0:
            err_str = py_str(_LIB.DECORDGetLastError())
            if not _ENABLE_STACK_TRACE:
                if 'Stack trace' in err_str:
                    err_str = err_str.split('Stack trace')[0].strip()
            if 'recovered from nearest frames' in err_str:
                if 'Stack trace' in err_str:
                    err_str = err_str.split('Stack trace')[0].strip()
                raise DECORDLimitReachedError(err_str)
>           raise DECORDError(err_str)
E           decord._ffi.base.DECORDError: [01:13:25] /home/ubuntu/decord/src/video/video_reader.cc:438: [/home/ubuntu/decord/examples/flipping_a_pancake.mkv]Unable to handle EOF, exit...

../../../python/decord/_ffi/base.py:78: DECORDError
----------------------------------------------------------- Captured stderr call ------------------------------------------------------------
[01:13:25] /home/ubuntu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla T4
[01:13:25] /home/ubuntu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 450.51, so using our own stream.

This is probably due to the retrying logic not suitable for async gpu decoding, I will dig into it and will update it here later

@zhreshold zhreshold added the bug Something isn't working label Feb 22, 2021
@thepowerfuldeez
Copy link

also has this erorr. It happens when I use torch DataLoader and inside torch Dataset invoke VideoReader

@zhreshold
Copy link
Member

Fixed in #140, let me know if the error persists after this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants