Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Redis disconnect handling #954

Merged
merged 1 commit into from
Nov 22, 2018
Merged

Conversation

popravich
Copy link
Contributor

This fixes celery/celery#3898 issue.

As far as I understand the celery hang is happening under the following scenario:

  • redis connection is created in main thread;
  • then, process forks a worker and it receives copies of all open sockets
  • when the worker is done (max_tasks_per_child) it cleans up its resources (closes connections)
  • redis-py client closes its socket in connection,
    however before close() it calls shutdown() which shuts down all copies of socket shared
    between all the processes
    So, again, if I understand all correctly, main process is still able to read that socket and poll always
    returns it as ready however it is in unoperational state.

@wolph
Copy link

wolph commented Nov 21, 2018

While this code does look like a nice improvement, I would vote against merging this in until the underlying hiredis bug is solved.

Since this exact same code works reliably without hiredis we can conclude that either hiredis or the hiredis wrapper code is the actual culprit.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possible way to test this proposed change?

@seocam
Copy link

seocam commented Nov 21, 2018

Is there any possible way to test this proposed change?

I suppose you are talking about a test to avoid regression but I have this running on my servers for about an hour and it's all good. Before patching 2 or 3 minutes were enough to cause the problem.

Anyway having a regression test for that would be great.

@popravich
Copy link
Contributor Author

Is there any possible way to test this proposed change?

For manual test I used docker scripts posted in related celery issue,
for automated testing — I'm not familiar with kombu that much(
Do you have any concurrency tests (multi-threaded or multi-processing)?

@popravich
Copy link
Contributor Author

There are several issues in redis-py related to sockets shutdown in forked applications:
redis/redis-py#863
redis/redis-py#784

The reason why uninstalling hiredis package magically helps is another bug in redis-py:
PythonParser (which is used if hiredis is not importable) closes connection's socket before
Connection calls shutdown:
https://github.com/andymccurdy/redis-py/blob/master/redis/connection.py#L277-L285
I don't see any rational reason why parser should close non-owned socket, so, if PythonParser
will get fixed someday — uninstalling hiredis will stop fixing this bug :sarcams:

@wolph
Copy link

wolph commented Nov 22, 2018

Excellent work @popravich! Thank you for chasing the bug down :)

I don't see any rational reason why parser should close non-owned socket, so, if PythonParser
will get fixed someday — uninstalling hiredis will stop fixing this bug :sarcams:

I fully agree. The big problem with this bug is/was how it manifested. It's not something you can handle from Python, it actually appears to be deadlocking in a bit of C code somewhere which is an issue. It completely stalls the Python interpreter which is something regular Python code should never be able to do.

@auvipy auvipy added this to the 4.3 milestone Nov 22, 2018
@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #954 into master will decrease coverage by 0.07%.
The diff coverage is 25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #954      +/-   ##
==========================================
- Coverage   88.66%   88.58%   -0.08%     
==========================================
  Files          63       63              
  Lines        6512     6519       +7     
  Branches      777      778       +1     
==========================================
+ Hits         5774     5775       +1     
- Misses        656      661       +5     
- Partials       82       83       +1
Impacted Files Coverage Δ
kombu/transport/redis.py 90% <25%> (-0.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fca408...efb7a44. Read the comment docs.

@auvipy auvipy merged commit 9b0694f into celery:master Nov 22, 2018
@popravich popravich deleted the redis_hang_fix branch November 22, 2018 08:46
lithammer added a commit to lithammer/kombu that referenced this pull request Feb 21, 2019
This reverts celery#954, and bumps the required redis-py dependency to 3.2.0
to include this fix:

redis/redis-py@4e1e748

Fixes celery#1006
auvipy pushed a commit that referenced this pull request Feb 22, 2019
This reverts #954, and bumps the required redis-py dependency to 3.2.0
to include this fix:

redis/redis-py@4e1e748

Fixes #1006
netbsd-srcmastr referenced this pull request in NetBSD/pkgsrc Mar 6, 2019
4.4.0:
- Restore bz2 import checks in compression module.
  The checks were removed in celery/kombu-938 <https://github.com/celery/kombu/pull/938>_ due to assumption that it only affected Jython.
  However, bz2 support can be missing in Pythons built without bz2 support.
- Fix regression that occurred in 4.3.0
  when parsing  Redis Sentinel master URI containing password.
- Handle the case when only one Redis Sentinel node is provided.
- Support SSL URL parameters correctly for rediss:// URIs.
- Revert celery/kombu-954 <https://github.com/celery/kombu/pull/954>_.
  Instead bump the required redis-py dependency to 3.2.0
  to include this fix redis/redis-py@4e1e748 <https://github.com/andymccurdy/redis-py/commit/4e1e74809235edc19e03edb79c97c80a3e4e9eca>_.
- Added support for broadcasting using a regular expression pattern
  or a glob pattern to multiple Pidboxes.
@liniribeiro
Copy link

Hello, This problem start to happen to me again after you revert 706f9f0

@auvipy
Copy link
Member

auvipy commented Mar 26, 2020

try celery=>4.4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Celery pool with Redis broker freeze silently or crash & freeze when hiredis package is installed
5 participants