sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

elrakita · 2015-09-19T04:49:47Z

Hi all,

I have noticed the following issue in my development mode. When I kill foreman with "Ctrl-C" and some worker is being processed, the "unique" record related to this worker is not deleted from Redis.

So, this worker is considered as not unique when I start foreman next time. I have not found any kind of possibility to solve this problem that is provided in your gem. So, I have added appropriate cleaning to my sidekiq initializer.

Did I miss anything? Do you provide a possibility of cleaning the sidekiq_unique record in Redis?

Thanks,
Elena.

mhenrixon · 2015-09-20T20:20:18Z

The trick is that since the job is gone the due to crashing/stopping/killing Sidekiq in the middle of processing there isn't any good ways for me to remove the lock. Possibly a page with locks and arguments could be added as a tab in Sidekiq. If the job is still in a scheduled it should be possible to remove the lock as well when deleting the job in the web client.

Basically Sidekiq::Queue.new('your queue').delete_all should do the trick for instance. In testing and it is a good practice to flush redis before each test or suite but I never considered development mode to be honest.

elrakita · 2015-09-21T02:42:54Z

Hi @marclennox, thanks for answering and I get your point about locks.

Actually, I tried Sidekiq::Queue.new('your queue').delete_all but I received undefined method `delete_all' for #Sidekiq::Queue:0x007fdd294dbd00.

Moreover Sidekiq::Queue.new('your queue').size returns zero, so the queue is considered empty. Which is quite logical, because the worker is already in process, not in the queue.

One more point: I wonder what happens on production mode when the application is deployed with Docker and the container is stopped/removed when some worker is in process. I suppose that unique record won't be deleted in this case and we will have the same issue. And of course we cannot flush Redis on production.

mhenrixon · 2015-09-21T08:45:06Z

I think you might have ment to mention @mhenrixon and not @marclennox :)

Depending on the queue emptying it might actually be .clear and not .delete_all. I'll see about adding some type of unique keys overview or some such. It might actually be possible to have some type of background jobs that check for missing jids but not sure how to best handle that. Have any suggestions?

marclennox · 2015-09-21T13:39:06Z

Yeah sorry guys I haven’t used unique-jobs in a while so probably can’t be of much help here.

—
Sent from Mailbox

On Mon, Sep 21, 2015 at 4:45 AM, Mikael Henriksson
notifications@github.com wrote:

I think you might have ment to mention @mhenrixon and not @marclennox :)

Depending on the queue emptying it might actually be .clear and not .delete_all. I'll see about adding some type of unique keys overview or some such. It might actually be possible to have some type of background jobs that check for missing jids but not sure how to best handle that. Have any suggestions?

Reply to this email directly or view it on GitHub:
#112 (comment)

elrakita · 2015-09-22T04:19:20Z

@mhenrixon, sure I wanted to mention you :)

In my opinion, a background job that checks for missing jids sounds like a "long polling" approach. In this case we would constantly check the condition, but the only cases when we need to remove unique payload hashes are startup/shutdown callbacks.

Meanwhile I thought about some different solution. What if we added cleanup on startup Sidekiq callback? Such callback could look the following way:

Sidekiq.configure_server do |config|
  config.on(:startup) { clear_old_unique_jobs }
end

Then this code would be executed only once on each startup. What do you think?

phuongnd08 · 2015-10-01T02:36:31Z

We need a work around. Such bug of sidekiq unique jobs is preventing job from added to sidekiq occasionally in our production, which is very serious.

mhenrixon · 2015-10-05T19:05:55Z

Check version 4.0.0 and report back if still an issue.

elrakita · 2015-10-06T18:07:23Z

Hi all,

I've tried to verify this issue on version 4.0.0 and I have noticed a critical bug.

Here is a snippet of my Gemfile:

gem 'rails', '4.2.4'
gem 'sidekiq'
gem 'sidekiq-unique-jobs', '4.0.0'

Here is my HardWorker:

class HardWorker
  include Sidekiq::Worker
  sidekiq_options :backtrace => 5, :unique => true

  def perform(name, count, salt)
    raise name if name == 'crash'
    logger.info Time.now
    sleep count
  end
end

I try to call the following instruction in rails console (it returns => "e673a0b0b3be7bb61b147f11" ):

HardWorker.perform_async("asdf", 15)

I see that my worker fails. Please let me know what is wrong. The log is given below.

Thanks,
Elena.

ID-ovkrdiju8 WARN: {"class"=>"HardWorker", "args"=>["asdf", 15], "retry"=>true, "queue"=>"default", "backtrace"=>5, "unique"=>true, "jid"=>"e673a0b0b3be7bb61b147f11", "created_at"=>1444154512.150679, "enqueued_at"=>1444154512.151179, "error_message"=>"uninitialized constant SidekiqUniqueJobs::RunLockFailed", "error_class"=>"NameError", "failed_at"=>1444154512.161897, "retry_count"=>0, "error_backtrace"=>["/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:49:in rescue in while_executing'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:46:inwhile_executing'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:19:in call'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:inblock in invoke'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/active_record.rb:6:in call'"]} 21:01:52 sidekiq.1 | 2015-10-06T18:01:52.165Z 46364 TID-ovkrdiju8 WARN: NameError: uninitialized constant SidekiqUniqueJobs::RunLockFailed 21:01:52 sidekiq.1 | 2015-10-06T18:01:52.166Z 46364 TID-ovkrdiju8 WARN: /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:49:inrescue in while_executing'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:46:in while_executing' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:19:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/active_record.rb:6:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/retry_jobs.rb:74:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/logging.rb:11:inblock in call'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/logging.rb:30:in with_context' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/logging.rb:7:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:132:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:132:in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:51:inblock in process'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:98:in stats' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:50:inprocess'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/calls.rb:28:in public_send' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/calls.rb:28:indispatch'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/call/async.rb:7:in dispatch' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/cell.rb:50:inblock in dispatch'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/cell.rb:76:in block in task' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/actor.rb:339:inblock in task'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/task.rb:44:in block in initialize' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/task/fibered.rb:14:inblock in create'
21:02:38 sidekiq.1 | 2015-10-06T18:02:38.709Z 46364 TID-ovkrl26yk HardWorker JID-e673a0b0b3be7bb61b147f11 INFO: start
21:02:38 sidekiq.1 | 2015-10-06T18:02:38.713Z 46364 TID-ovkrl26yk HardWorker JID-e673a0b0b3be7bb61b147f11 INFO: fail: 0.004 sec

mhenrixon · 2015-10-06T20:21:58Z

Sorry about that, should be good in 4.0.2

phuongnd08 · 2015-10-07T02:37:53Z

4.0.{0,1,2} is no longer working for me. Jobs are being duplicated all over the place, repeatedly.

mhenrixon · 2015-10-07T06:32:06Z

Already wrote you about that in the other issue. Check the reade like suggested :)

elrakita · 2015-10-08T04:33:59Z

Hi all,

I have tried to verify the issue one more time for 4.0.2. I've run my HardWorker with sleep = 15 seconds and pressed "Ctrl+C" while the worker was executing. I've done this two times and one time it worked fine and the other time it did not work correctly.

Here is the log of successful shutdown (unique record was not present in Redis after this):

07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: start
07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: 2015-10-08 07:22:40 +0300
^CSIGINT received
07:22:51 system | sending SIGTERM to all processes
07:22:51 web.1 | terminated by SIGTERM
07:22:56 sidekiq.1 | exited with code 0

And here is the log of not successful shutdown (unique record was present in Redis after this).

07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes HardWorker JID-07c1f5c362e070cf634fec2c INFO: start
07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes HardWorker JID-07c1f5c362e070cf634fec2c INFO: 2015-10-08 07:23:39 +0300
^CSIGINT received
07:23:44 system | sending SIGTERM to all processes
07:23:44 web.1 | terminated by SIGTERM
07:23:49 system | sending SIGKILL to all processes
07:23:49 | Exiting
07:23:49 sidekiq.1 | 2015-10-08T04:23:44.591Z 48014 TID-ouvuq8934 INFO: Shutting down
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.494Z 48014 TID-ouvus5jto INFO: Terminating 24 quiet workers
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.498Z 48014 TID-ouvus5jto INFO: Pausing up to 8 seconds to allow workers to finish...
07:23:49 sidekiq.1 | terminated by SIGKILL

Here is also my Redis output for the second log:

127.0.0.1:6379> keys unique*

"uniquejobs:748ff6fe808e4f19f12d94a2454e1a48"

Based on this, I suppose that the solution provided for unique_jobs in version 4.0.2 is not very stable. Could you please take a look?

Thanks,
Elena

mhenrixon · 2015-10-08T04:42:07Z

Not sure there is a whole lot I can do about the kill signal. What happens
is that foreman sidekiq gives each worker 5 seconds? to shut down
gracefully before sending the kill signal. The problem is that due to the
kill signal there is no way to handle any cleanup that I know of. This is
not a sidekiq unique jobs issue but a Linux process signal issue. It just
happens to appear to be a jobs issue since that is what makes it visible.
Maybe @mperham can suggest something?

On Thu, Oct 8, 2015, 6:34 AM Elena Rakita notifications@github.com wrote:

Hi all,

I have tried to verify the issue one more time for 4.0.2. I've run my
HardWorker with sleep = 15 seconds and pressed "Ctrl+C" while the worker
was executing. I've done this two times and one time it worked fine and the
other time it did not work correctly.

Here is the log of successful shutdown (unique record was not present in
Redis after this):

07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk
HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: start
07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk
HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: 2015-10-08 07:22:40 +0300
^CSIGINT received
07:22:51 system | sending SIGTERM to all processes
07:22:51 web.1 | terminated by SIGTERM
07:22:56 sidekiq.1 | exited with code 0

And here is the log of not successful shutdown (unique record was present
in Redis after this).

07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes
HardWorker JID-07c1f5c362e070cf634fec2c INFO: start
07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes
HardWorker JID-07c1f5c362e070cf634fec2c INFO: 2015-10-08 07:23:39 +0300
^CSIGINT received
07:23:44 system | sending SIGTERM to all processes
07:23:44 web.1 | terminated by SIGTERM
07:23:49 system | sending SIGKILL to all processes
07:23:49 | Exiting
07:23:49 sidekiq.1 | 2015-10-08T04:23:44.591Z 48014 TID-ouvuq8934 INFO:
Shutting down
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.494Z 48014 TID-ouvus5jto INFO:
Terminating 24 quiet workers
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.498Z 48014 TID-ouvus5jto INFO:
Pausing up to 8 seconds to allow workers to finish...
07:23:49 sidekiq.1 | terminated by SIGKILL

Here is also my Redis output for the second log:

127.0.0.1:6379> keys unique*

"uniquejobs:748ff6fe808e4f19f12d94a2454e1a48"

Based on this, I suppose that the solution provided for unique_jobs in
version 4.0.2 is not very stable. Could you please take a look?

Thanks,
Elena

—
Reply to this email directly or view it on GitHub
#112 (comment)
.

mperham · 2015-10-08T04:46:00Z

If Foreman only gives 5 seconds to exit, you can't use a shutdown timeout of 8 seconds. Use -t 3.

phuongnd08 · 2015-10-08T10:19:25Z

Isn't it the lock will expire after a while? At least it is auto-healing?

elrakita · 2015-10-08T11:06:10Z

@phuongnd08, yes, the lock is auto-healing by default (at least it was in version 3). @mhenrixon, could you please remind us what is default expiration time of the lock?

mhenrixon closed this as completed Oct 5, 2015

mhenrixon mentioned this issue Jan 10, 2019

Automatic unlock of jobs #362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

elrakita commented Sep 19, 2015

mhenrixon commented Sep 20, 2015

elrakita commented Sep 21, 2015

mhenrixon commented Sep 21, 2015

marclennox commented Sep 21, 2015

elrakita commented Sep 22, 2015

phuongnd08 commented Oct 1, 2015

mhenrixon commented Oct 5, 2015

elrakita commented Oct 6, 2015

mhenrixon commented Oct 6, 2015

phuongnd08 commented Oct 7, 2015

mhenrixon commented Oct 7, 2015

elrakita commented Oct 8, 2015

mhenrixon commented Oct 8, 2015

mperham commented Oct 8, 2015

phuongnd08 commented Oct 8, 2015

elrakita commented Oct 8, 2015

sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

Comments

elrakita commented Sep 19, 2015

mhenrixon commented Sep 20, 2015

elrakita commented Sep 21, 2015

mhenrixon commented Sep 21, 2015

marclennox commented Sep 21, 2015

elrakita commented Sep 22, 2015

phuongnd08 commented Oct 1, 2015

mhenrixon commented Oct 5, 2015

elrakita commented Oct 6, 2015

mhenrixon commented Oct 6, 2015

phuongnd08 commented Oct 7, 2015

mhenrixon commented Oct 7, 2015

elrakita commented Oct 8, 2015

mhenrixon commented Oct 8, 2015

mperham commented Oct 8, 2015

phuongnd08 commented Oct 8, 2015

elrakita commented Oct 8, 2015