-
-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Redis disconnect handling #954
Conversation
While this code does look like a nice improvement, I would vote against merging this in until the underlying hiredis bug is solved. Since this exact same code works reliably without hiredis we can conclude that either hiredis or the hiredis wrapper code is the actual culprit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possible way to test this proposed change?
I suppose you are talking about a test to avoid regression but I have this running on my servers for about an hour and it's all good. Before patching 2 or 3 minutes were enough to cause the problem. Anyway having a regression test for that would be great. |
For manual test I used docker scripts posted in related celery issue, |
There are several issues in redis-py related to sockets The reason why uninstalling |
Excellent work @popravich! Thank you for chasing the bug down :)
I fully agree. The big problem with this bug is/was how it manifested. It's not something you can handle from Python, it actually appears to be deadlocking in a bit of C code somewhere which is an issue. It completely stalls the Python interpreter which is something regular Python code should never be able to do. |
Codecov Report
@@ Coverage Diff @@
## master #954 +/- ##
==========================================
- Coverage 88.66% 88.58% -0.08%
==========================================
Files 63 63
Lines 6512 6519 +7
Branches 777 778 +1
==========================================
+ Hits 5774 5775 +1
- Misses 656 661 +5
- Partials 82 83 +1
Continue to review full report at Codecov.
|
This reverts celery#954, and bumps the required redis-py dependency to 3.2.0 to include this fix: redis/redis-py@4e1e748 Fixes celery#1006
This reverts #954, and bumps the required redis-py dependency to 3.2.0 to include this fix: redis/redis-py@4e1e748 Fixes #1006
4.4.0: - Restore bz2 import checks in compression module. The checks were removed in celery/kombu-938 <https://github.com/celery/kombu/pull/938>_ due to assumption that it only affected Jython. However, bz2 support can be missing in Pythons built without bz2 support. - Fix regression that occurred in 4.3.0 when parsing Redis Sentinel master URI containing password. - Handle the case when only one Redis Sentinel node is provided. - Support SSL URL parameters correctly for rediss:// URIs. - Revert celery/kombu-954 <https://github.com/celery/kombu/pull/954>_. Instead bump the required redis-py dependency to 3.2.0 to include this fix redis/redis-py@4e1e748 <https://github.com/andymccurdy/redis-py/commit/4e1e74809235edc19e03edb79c97c80a3e4e9eca>_. - Added support for broadcasting using a regular expression pattern or a glob pattern to multiple Pidboxes.
Hello, This problem start to happen to me again after you revert 706f9f0 |
try celery=>4.4.2 |
This fixes celery/celery#3898 issue.
As far as I understand the celery hang is happening under the following scenario:
max_tasks_per_child
) it cleans up its resources (closes connections)however before
close()
it callsshutdown()
which shuts down all copies of socket sharedbetween all the processes
So, again, if I understand all correctly, main process is still able to read that socket and poll always
returns it as ready however it is in unoperational state.