Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table empty or key no longer exists #1063

Closed
sreedharbukya opened this issue Jun 19, 2019 · 108 comments · Fixed by #1394 or #1404
Closed

Table empty or key no longer exists #1063

sreedharbukya opened this issue Jun 19, 2019 · 108 comments · Fixed by #1394 or #1404

Comments

@sreedharbukya
Copy link

The issue with redis key getting evicted every time. I read an old issue link. I have confirmed that my Redis instance is not hacked. In fact, we are using Secured Redis.

OperationalError("\nCannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n",)

kombu==4.5.0
celery==4.3.0
redis==3.2.1

Is this some issue with redis?

@tbolis
Copy link

tbolis commented Jun 19, 2019

same issue here, is there any workaround on this? celery workers will freeze after that and we need a restart

@ra-coder
Copy link

ra-coder commented Sep 8, 2019

what you mean by 'configure Redis correctly' ?

I have same problem in flask app with next config.py

# redis
REDIS_URL = os.environ['REDIS_URL']

# flask-caching
CACHE_TYPE = 'redis'
CACHE_KEY_PREFIX = 'glue_flask_cache_'
CACHE_REDIS_URL = REDIS_URL

@danleyb2
Copy link

danleyb2 commented Oct 1, 2019

faced this same issue on the first queue whenever i started a second or more queues

fixed by downgrading kombu==4.5.0 from kombu==4.6.5

had nothing to do with redis. just the missing key _kombu.binding.reply.celery.pidbox that is never created if you redis-cli monitor

@LuRsT
Copy link

LuRsT commented Oct 1, 2019

I found the same issue, @danleyb2, did you figure out what the problem was with the current version?

Update: Downgrading to v4.5.0 solved the issue. Thanks @danleyb2

@auvipy
Copy link
Member

auvipy commented Oct 1, 2019

This is present in celery integration Redis tests as well!

@LuRsT
Copy link

LuRsT commented Oct 1, 2019

I noticed @auvipy, any plans on fixing it? Do you need any help?

@auvipy
Copy link
Member

auvipy commented Oct 1, 2019

yes if you have time!

@auvipy auvipy added this to the 4.6.0 milestone Oct 2, 2019
newacropolis-uk pushed a commit to NewAcropolis/api that referenced this issue Oct 5, 2019
@StingyJack
Copy link

StingyJack commented Oct 7, 2019

I was having this problem with kombu 4.5.0 when using celery as a service in a docker-compose pod that included a redis server image and a few app images. When I used up -d <serviceName> and started services individually, starting with Redis, the error would show up in the logs repeatedly. when I used up -d without a service name, this problem seemed to go away.

Edit: the version I named is likely incorrect. Our project's setup.py record was missing a comma between version ranges so it was using or applying whatever version was above the concatenation of the min and max versions. Which would have been the affected package version at some times.

@killthekitten
Copy link

Looks like the reason is #1087. The bug showed up last week, after 4.6.4 -> 4.6.5 migration.

@killthekitten
Copy link

@auvipy could you point to the failing integration test, please? I couldn't reproduce the bug locally, so I just pinned the version to 4.6.4 blindly.

@boomxy
Copy link

boomxy commented Oct 16, 2019

Looks like the reason is #1087. The bug showed up last week, after 4.6.4 -> 4.6.5 migration.

thanks you, 4.6.4 it works!

@jorijinnall
Copy link

Had the same issue.
I fixed by downgrading kombu from 4.6.5 to 4.6.3
I still had the bug in version 4.6.4

@travishen
Copy link

travishen commented Oct 22, 2019

same issue here

celery==4.3.0
redis==3.2.1
kombu==4.6.3  # downgrade meant for an flower issue https://github.com/mher/flower/issues/909

I found the error start to occur at worker recreate (e.g. k8s pod scaling), and affect to all the other workers.
The worker has additional settings: concurrency(prefork) and max-memory -per-child

@kravietz
Copy link

kombu==4.6.3 fixed it for me -- had the same issue with Celery worker crashing.

mlissner added a commit to freelawproject/courtlistener that referenced this issue Oct 25, 2019
Django Extensions was failing on our version of Django and had to be
updated.

Kombu is a different, darker story. The reason it needs to be changed is
because we didn't previously have it in place, and instead relied on
celery to get us the right version. However, for no apparent reason,
celery recently started requiring a more recent version of kombu. That
version, it turns out crashed, with this bug:

celery/kombu#1063

So...we had to specify which version of kombu we wanted, and that's what
we do here. Ugh.
@auvipy
Copy link
Member

auvipy commented Oct 25, 2019

what about kombu==4.6.4?

said-moj added a commit to ministryofjustice/laa-legal-adviser-api that referenced this issue Oct 29, 2019
As suggested in celery/kombu#1063
This should fixes the issue with redis key getting evicted every now and then
and should mean we stop receiving the following error:
Control command error: OperationalError(u"\nCannot route message for exchange u'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key (u'_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n",)
said-moj added a commit to ministryofjustice/laa-legal-adviser-api that referenced this issue Oct 29, 2019
As suggested in celery/kombu#1063
This should fixes the issue with redis key getting evicted every now and then
and should mean we stop receiving the following error:
Control command error: OperationalError(u"\nCannot route message for exchange u'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key (u'_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n",)
said-moj added a commit to ministryofjustice/laa-legal-adviser-api that referenced this issue Oct 29, 2019
As suggested in celery/kombu#1063
This should fixes the issue with redis key getting evicted every now and then
and should mean we stop receiving the following error:
Control command error: OperationalError(u"\nCannot route message for exchange u'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key (u'_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n",)
@chris-griffin
Copy link

chris-griffin commented Oct 29, 2019

Downgrading from 4.6.5 to 4.6.4 worked for us @auvipy when using celery 4.4.0rc3 (with celery/celery@8e34a67 cherry picked on top to address a different issue)

opennode-jenkins pushed a commit to waldur/waldur-mastermind that referenced this issue Oct 30, 2019
As suggested in celery/kombu#1063
This should fixes the issue with redis key getting evicted every now and then
and should mean we stop receiving the following error:
Control command error: OperationalError(u"\nCannot route message for exchange u'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key (u'_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n",)
@auvipy auvipy self-assigned this Oct 31, 2019
nijel added a commit to WeblateOrg/docker that referenced this issue Oct 31, 2019
See celery/kombu#1063

Signed-off-by: Michal Čihař <michal@cihar.com>
@matusvalo
Copy link
Member

matusvalo commented Sep 29, 2021

I think I am able to fix the issue. The problem is cause by multiple workers sharing the same oid (which is used to create a keys in _kombu.binding.reply.celery.pidbox. It is causing that sometimes different worker removes it even when other is still using it. oid value must be unique per worker otherwise it will cause following issue. The fix is simple just following method should be "uncached"

kombu/kombu/pidbox.py

Lines 407 to 413 in 5ef5e22

@cached_property
def oid(self):
try:
return self._tls.OID
except AttributeError:
oid = self._tls.OID = oid_from(self)
return oid

The fix is to rewrite property as follows:

    @property
    def oid(self):
       return oid_from(self)

This change alone seems to be fixing the issue. I have executed multiple runs of aforementioned reproducer and I was not able to reproduce crash anymore. I will provide the PR with the fix but I would like to ask everyone to test it.

Note: I tried just use @property instead of @cached_property but it did not help. After additional removal of self._tls cache attribute fixed the issue.

@Dogrtt
Copy link

Dogrtt commented Sep 29, 2021

... multiple workers ...

I've got one Docker container with single Celery worker in --pool=solo mode on aboard. I faced the same issue several minutes ago. Celery 5.1.2, Kombu 5.1.0. I used redis-cli ping for healthcheck but I removed it in last MR, will check if issue still here and write back.

@matusvalo
Copy link
Member

I've got one Docker container with single Celery worker in --pool=solo mode on aboard. I faced the same issue several minutes ago.

Hmm I tried the example above with single worker and I was not able to reproduce the issue even with unpatched kombu:

celery -A tasks worker -E --loglevel=INFO --autoscale=1

@matusvalo
Copy link
Member

Potential fix created in PR #1394 . Please test. For now I am marking it as draft. The best is to have multiple users confirming this fix.

@Dogrtt
Copy link

Dogrtt commented Oct 13, 2021

Hi @matusvalo, I added your fix to my container's kombu v.5.1.0 manually, but I'm still facing this issue:

kombu.exceptions.OperationalError:
Cannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.
Probably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.

Problem is that on my local PC everything is fine, but when I'm running Celery container on VM this issue comes...
So I can't say that the solution is not working, but for some reason it still happening for me.

@matusvalo
Copy link
Member

@Dogrtt are you able prepare some reproducable case? If yes please post it here and we will reopen the issue.

@auvipy
Copy link
Member

auvipy commented Oct 15, 2021

Hi @matusvalo, I added your fix to my container's kombu v.5.1.0 manually, but I'm still facing this issue:

kombu.exceptions.OperationalError:
Cannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.
Probably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.

Problem is that on my local PC everything is fine, but when I'm running Celery container on VM this issue comes... So I can't say that the solution is not working, but for some reason it still happening for me.

you should directly use from main branch

@omoumniabdou
Copy link

omoumniabdou commented Oct 15, 2021

@Dogrtt Same for me.

celery[redis]==4.4.6
kombu==4.6.11
redis==3.5.3 (server 5.0.3)

I patched the app but we still have the issue:

class FixedMailbox(Mailbox):
    """
    Patch kombu 4.6.11 with a PR from kombu 5
    See https://github.com/celery/kombu/pull/1394
    """

    @property
    def oid(self):
        return oid_from(self)


class FixedControl(Control):
    Mailbox = FixedMailbox


app = Celery("pipeline", control=FixedControl)

@adililhan
Copy link

adililhan commented Oct 16, 2021

@matusvalo Hi, I managed to (accidentally) reproduce this error. I added time.sleep(3) to the ping function: https://github.com/celery/celery/blob/master/celery/worker/control.py#L322

Like this:

@inspect_command(default_timeout=0.2)
def ping(state, **kwargs):
    time.sleep(3)
    return ok('pong')

After that, I tried to send a ping request to the Celery instance:

celery.control.inspect().ping()

I got the same error.

"Control command error: OperationalError(\"\\nCannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.\\nProbably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\\n\")"

So, this error occurs if the ping function doesn't return a response within a reasonable time frame. I hope this sample will help you understand the issue.

@matusvalo
Copy link
Member

matusvalo commented Oct 17, 2021

I can confirm the bug. I have checked it also before #1394 and it is still occurring so it is not introduced by this fix 🎉 . Hence, this bug has different root cause then the bug fixed by #1394. I have checked this bug and it is still occurring when concurrency is set to 1.

@matusvalo matusvalo reopened this Oct 17, 2021
@matusvalo
Copy link
Member

matusvalo commented Oct 21, 2021

OK I did investigation and here are results:

  1. set keys in Redis works in a way that when you remove the last record from the set key, the set key is removed from redis
  2. _kombu.binding.reply.celery.pidbox key in redis is of type set and it contains queues bound to the virtual exchange
  3. the celery.control.inspect().ping() method is not synchronous. It means it does not wait for response from Celery workers. If it does not get response "immediately" it returns None.
  4. Methodcelery.control.inspect().ping() in the beginning creates new queue and put it to _kombu.binding.reply.celery.pidbox. After executing logic of method, the queue is deleted/removed and hence removed from the set.

Hence, in 99.99% cases we are not able to see anything because the workers are fast enough to write response before celery.control.inspect().ping() returns. But by artificial introduction of delay using sleep() in responding of ping to client we have corner case when the worker respons after method celery.control.inspect().ping() returns. But due 4. the queue does not exists and is removed from the _kombu.binding.reply.celery.pidbox set. Moreover, if it is that queue was the only one present, the set is deleted from redis due 1. in the time when worker tries to write reply to the queue using the virtual exchange. And this is causing the exception we are seeing.

@matusvalo
Copy link
Member

So we should unify logic of kombu for cases in Redis transport when

  1. the virtual exchange key exists in redis but does not contain the queue and hence we cannot route the message to queue
  2. the virtual exchange key does not exists in redis and also we cannot route the message to queue

Or at least we need to change the exception message because it is misleading. Instead of

Cannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.\nProbably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.\n

we should have something like

Cannot route message. No queues bound to exchange.

@matusvalo
Copy link
Member

PR fixing the issue created. Basically I have just removed the raise of the exception. It seems that it should fix the issue and I consider that the exception does not have any benefit because:

  1. table can be removed any time - when publishing against empty exchange
  2. the amqp protocol which we simulate by default destroys all unroutable messages
  3. currently the message which does not have routing key in table is destroyed when table is not empty

See the integration tests of the PR for details

@matusvalo
Copy link
Member

Can someone check and verify the fix #1404 in order to understand whether it fixes issues for you?

@omoumniabdou
Copy link

omoumniabdou commented Nov 2, 2021

hello @matusvalo
I can confirm that the fix #1404 works for us. We had the issue every few hours on our pipeline (2 servers with dozen of workers each) since we changed our redis server (from version 5.0.1 to 5.0.3):

celery[redis]==4.4.6
kombu==4.6.11
redis==3.5.3 (server 5.0.3)

For those who do not want to upgrade, you can patch kombu.transport.redis :

import kombu.transport.redis
from kombu.exceptions import InconsistencyError
from kombu.transport import TRANSPORT_ALIASES


class FixedChannel(kombu.transport.redis.Channel):
    def get_table(self, exchange):
        try:
            return super().get_table(exchange)
        except InconsistencyError:  # pragma: no cover
            # table does not exists since all queues bound to the exchange
            # were deleted. We need just return empty list.
            return []


class FixedTransport(kombu.transport.redis.Transport):
    Channel = FixedChannel

# Hack to override redis transport impl
TRANSPORT_ALIASES["redis"] = "$PATH_THIS_FILE:FixedTransport"

@matusvalo matusvalo unpinned this issue Nov 3, 2021
@auvipy auvipy added this to the 5.2 milestone Nov 3, 2021
@lpsinger
Copy link

lpsinger commented Dec 8, 2021

Check your redis.conf.
Specifically maxmemory-policy.
If its set to noeviction or does not have a value we may have a problem in Celery.

We're seeing these errors with kombu 5.2.1 and celery 5.1.2. We do have maxmemory-policy set to volatile-lru. Do we need to change it to noeviction in order to fix this?

@matusvalo
Copy link
Member

We're seeing these errors with kombu 5.2.1 and celery 5.1.2.

Which errors do you see? Can you provide the traceback? The whole exception was removed in #1404.

@lpsinger
Copy link

lpsinger commented Dec 9, 2021

We're seeing these errors with kombu 5.2.1 and celery 5.1.2.

Which errors do you see? Can you provide the traceback? The whole exception was removed in #1404.

See https://git.ligo.org/emfollow/gwcelery/-/issues/397.

@matusvalo
Copy link
Member

Are you sure that you are using kombu v. 5.2.1? As I said exception below was removed in Kombu 5.2.0. - see #1404.

  File "/home/emfollow-playground/.local/lib/python3.8/site-packages/kombu/transport/redis.py", line 884, in get_table
    raise InconsistencyError(NO_ROUTE_ERROR.format(exchange, key))
kombu.exceptions.InconsistencyError: 
Cannot route message for exchange 'reply.celery.pidbox': Table empty or key no longer exists.
Probably the key ('_kombu.binding.reply.celery.pidbox') has been removed from the Redis database.

@lpsinger
Copy link

lpsinger commented Dec 9, 2021

No, we were using kombu 5.1.0. I just updated.

@matusvalo
Copy link
Member

No, we were using kombu 5.1.0. I just updated.

Please update kombu to latest version where the issue is fixed.

@matejruzicka
Copy link

hello @matusvalo I can confirm that the fix #1404 works for us. We had the issue every few hours on our pipeline (2 servers with dozen of workers each) since we changed our redis server (from version 5.0.1 to 5.0.3):

celery[redis]==4.4.6 kombu==4.6.11 redis==3.5.3 (server 5.0.3)

For those who do not want to upgrade, you can patch kombu.transport.redis :

import kombu.transport.redis
from kombu.exceptions import InconsistencyError
from kombu.transport import TRANSPORT_ALIASES


class FixedChannel(kombu.transport.redis.Channel):
    def get_table(self, exchange):
        try:
            return super().get_table(exchange)
        except InconsistencyError:  # pragma: no cover
            # table does not exists since all queues bound to the exchange
            # were deleted. We need just return empty list.
            return []


class FixedTransport(kombu.transport.redis.Transport):
    Channel = FixedChannel

# Hack to override redis transport impl
TRANSPORT_ALIASES["redis"] = "$PATH_THIS_FILE:FixedTransport"

I am so sorry but do you think you could advise me on how exactly to use this patch? What file do I add it to? Or do I put it in a new file? Where should the file be located? What should it be called? Should the file be called from somewhere from within the app? Shou this part "$PATH_THIS_FILE:FixedTransport" be replaced for something? What does PATH_THIS_FILE stand for?

adonis0302 pushed a commit to adonis0302/Education_Platform_Backend that referenced this issue Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment