Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async cancellation swallowed by BRPOPLPUSH #2028

Open
aaliddell opened this issue Feb 28, 2022 · 3 comments
Open

Async cancellation swallowed by BRPOPLPUSH #2028

aaliddell opened this issue Feb 28, 2022 · 3 comments

Comments

@aaliddell
Copy link

Should issues be submitted here or on aioredis package now that they've merged?

Using the BRPOPLPUSH command on an asyncio pool, if a cancellation occurs at exactly the right time it is swallowed and never bubbled up to calling code. This leads to tasks that never exit and violates the convention that CancelledError should never be swallowed. The same behaviour is seen with BRPOP, so I assume this is related to blocking calls. I can't recreate with non-blocking calls, but it may just be the race window is smaller there.

I can't see an obvious place for this swallowing of the cancellation in the code, but you are catching BaseException which will catch CancelledError (unlike catching on Exception), so care needs to be taken to ensure it is always re-raised after any cleanup is done. In cases where you are catching BaseException it appears you are trying to disconnect, so perhaps there is a deadlock between the disconnect and the command submit?

Demo

Run the following inside a loop (wrap it all in asyncio.run or just run raw in IPython cell). It will not fail every time, as it is timing dependent. You will have to adjust the 0.003 second delay up or down depending on your system speed until it begins to fail. I get about 25% failure with the code below.

import asyncio
from redis import asyncio as aioredis

pool = aioredis.from_url('redis://')

async def runner():
    print('start')
    try:
        async with pool as conn:
            print('have conn')
            
            await conn.get('key')
            
            print('done get')
            
            await conn.brpoplpush('key', 'other key')
            
            print('never expect to get here due to blocking')
    finally:
        print('bye')

task = asyncio.create_task(runner())
print('created')
await asyncio.sleep(0.003)  # <- Try to land cancel at exact right point
task.cancel()
print('cancelled')

await asyncio.sleep(1) # <- Give a generous time to complete cancel
assert task.cancelled() # <- Expect this to never fail
print('ok')

The above code should never fail the assertion, as we have cancelled the task and the CancelledError should propagate out within the time we've given it (waiting longer doesn't help, the task never stops). If the 0.003 second delay is decreased, it never fails. Similarly, if the 0.003 second delay is increased, it also never fails. So there is a short window during the start of the conn.brpoplpush command that the cancellation is swallowed and the task never exits.

On a failure, the following will be printed:

created
start
have conn
done get
cancelled

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_3521979/4075451794.py in <module>
     28 
     29 await asyncio.sleep(1)
---> 30 assert task.cancelled()
     31 print('ok')

AssertionError: 

On a good run, you will get the following if the cancel was too early:

created
start
have conn
cancelled
bye
ok

Or if the cancel was too late to trigger:

created
start
have conn
done get
cancelled
bye
ok

Info

Version: 4.2.0rc1 and also 2.0.1 of aioredis

Platform: Linux, Python 3.8.10 and 3.10.2 confirmed

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

This issue is marked stale. It will be closed in 30 days if it is not updated.

@github-actions github-actions bot added the Stale label Mar 1, 2023
@jakob-keller
Copy link
Contributor

bump

@github-actions github-actions bot removed the Stale label Mar 2, 2023
@dvora-h
Copy link
Collaborator

dvora-h commented Feb 11, 2024

We solved few bags with async canceled errors from the time this issue opened, can someone check if this is still relevant or solved by the fixes we did?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants