-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
until_and_while_executing not entering perform method on initial run #824
Comments
This is the weirdest issue I have come across so far. I had this happen once in a blank project (example app in this repository). Fixed it and haven't been able yo replicate it since. Other people did have this and then I didnt hear from them again. |
I was digging deeper into this issue. The following occurs:
The main problem here, is that the Sidekiq worker pushes the Job status with a big delay. It takes up to 10 seconds until a job appears as "processing" (see the Sidekiq source here & here). This leads to a situation where job "disappears" for that period. By the way, this can affect even jobs during normal runtime. If a worker picks them up and the reaper runs in the 10 seconds delay time, we are in the same situation. I already thought about if this is a design error on Sidekiqs side, but I guess they are using the heartbeat method to reduce pressure on the DB and allow very high throughputs. A possible solution I see here: the heartbeat timing is hardcoded in Sidekiq, so what we could do, is to maintain (supposedly) orphaned locks for at least 10 seconds in memory and run a re-check after that time. If they are still orphans, they can be removed. What do you think? |
I prepared a simple dockerized repository to reproduce the error: https://github.com/tangopium/sidekiq-delay-issue |
@tangopium i can confirm the issue. I have a task with the same options as yours example.
|
I double checked the config and found these line missing:
and looks like the lock exipres as it should |
@DmitryRibalka thanks for sharing your experience. The error you faced doesn't look like the one I described and is probably caused by the missing configuration you've mentioned @mhenrixon Any update regarding my question? Did you have time to look into it? |
Thank you, @tangopium! This makes it a little more challenging. So we need a queue for these situations with a timestamp to compare with. Fantastic research, and it makes sense. I will have a look at optimizing a few places as well. I am primarily using Lua already for that reason. I haven't looked yet because I feel beat about rechecking this issue, but you gave me hope. |
If I have a worker configured to be locked
until_and_while_executing
and I schedule a worker while sidekiq is not running and then start sidekiq, the worker seems to start and finish (in the console), but never enters the "perform" method. I noticed that if I turn of the reaper, this issue doesn't happen. I'm starting sidekiq immediately after I scheduled the job (so it's not eliminated because if the TTL)The logs look like the following:
This does not happen with for example
until_executed
In some rare cases the worker is running through:
I looks to me like a raise condition.
Sidenote: I went down to Sidekiq 6.5.12 and Sidekiq Unique Jobs 7.1.12 and was able to reproduce this bug.
The text was updated successfully, but these errors were encountered: