Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidekiq_unique record in Redis is not cleaned when foreman process is killed #112

Closed
elrakita opened this issue Sep 19, 2015 · 16 comments
Closed

Comments

@elrakita
Copy link

Hi all,

I have noticed the following issue in my development mode. When I kill foreman with "Ctrl-C" and some worker is being processed, the "unique" record related to this worker is not deleted from Redis.

So, this worker is considered as not unique when I start foreman next time. I have not found any kind of possibility to solve this problem that is provided in your gem. So, I have added appropriate cleaning to my sidekiq initializer.

Did I miss anything? Do you provide a possibility of cleaning the sidekiq_unique record in Redis?

Thanks,
Elena.

@mhenrixon
Copy link
Owner

The trick is that since the job is gone the due to crashing/stopping/killing Sidekiq in the middle of processing there isn't any good ways for me to remove the lock. Possibly a page with locks and arguments could be added as a tab in Sidekiq. If the job is still in a scheduled it should be possible to remove the lock as well when deleting the job in the web client.

Basically Sidekiq::Queue.new('your queue').delete_all should do the trick for instance. In testing and it is a good practice to flush redis before each test or suite but I never considered development mode to be honest.

@elrakita
Copy link
Author

Hi @marclennox, thanks for answering and I get your point about locks.

Actually, I tried Sidekiq::Queue.new('your queue').delete_all but I received undefined method `delete_all' for #Sidekiq::Queue:0x007fdd294dbd00.

Moreover Sidekiq::Queue.new('your queue').size returns zero, so the queue is considered empty. Which is quite logical, because the worker is already in process, not in the queue.

One more point: I wonder what happens on production mode when the application is deployed with Docker and the container is stopped/removed when some worker is in process. I suppose that unique record won't be deleted in this case and we will have the same issue. And of course we cannot flush Redis on production.

@mhenrixon
Copy link
Owner

I think you might have ment to mention @mhenrixon and not @marclennox :)

Depending on the queue emptying it might actually be .clear and not .delete_all. I'll see about adding some type of unique keys overview or some such. It might actually be possible to have some type of background jobs that check for missing jids but not sure how to best handle that. Have any suggestions?

@marclennox
Copy link
Contributor

Yeah sorry guys I haven’t used unique-jobs in a while so probably can’t be of much help here.


Sent from Mailbox

On Mon, Sep 21, 2015 at 4:45 AM, Mikael Henriksson
notifications@github.com wrote:

I think you might have ment to mention @mhenrixon and not @marclennox :)

Depending on the queue emptying it might actually be .clear and not .delete_all. I'll see about adding some type of unique keys overview or some such. It might actually be possible to have some type of background jobs that check for missing jids but not sure how to best handle that. Have any suggestions?

Reply to this email directly or view it on GitHub:
#112 (comment)

@elrakita
Copy link
Author

@mhenrixon, sure I wanted to mention you :)

In my opinion, a background job that checks for missing jids sounds like a "long polling" approach. In this case we would constantly check the condition, but the only cases when we need to remove unique payload hashes are startup/shutdown callbacks.

Meanwhile I thought about some different solution. What if we added cleanup on startup Sidekiq callback? Such callback could look the following way:

Sidekiq.configure_server do |config|
  config.on(:startup) { clear_old_unique_jobs }
end

Then this code would be executed only once on each startup. What do you think?

@phuongnd08
Copy link
Contributor

We need a work around. Such bug of sidekiq unique jobs is preventing job from added to sidekiq occasionally in our production, which is very serious.

@mhenrixon
Copy link
Owner

Check version 4.0.0 and report back if still an issue.

@elrakita
Copy link
Author

elrakita commented Oct 6, 2015

Hi all,

I've tried to verify this issue on version 4.0.0 and I have noticed a critical bug.

Here is a snippet of my Gemfile:

gem 'rails', '4.2.4'
gem 'sidekiq'
gem 'sidekiq-unique-jobs', '4.0.0'

Here is my HardWorker:

class HardWorker
  include Sidekiq::Worker
  sidekiq_options :backtrace => 5, :unique => true

  def perform(name, count, salt)
    raise name if name == 'crash'
    logger.info Time.now
    sleep count
  end
end

I try to call the following instruction in rails console (it returns => "e673a0b0b3be7bb61b147f11" ):

HardWorker.perform_async("asdf", 15)

I see that my worker fails. Please let me know what is wrong. The log is given below.

Thanks,
Elena.

ID-ovkrdiju8 WARN: {"class"=>"HardWorker", "args"=>["asdf", 15], "retry"=>true, "queue"=>"default", "backtrace"=>5, "unique"=>true, "jid"=>"e673a0b0b3be7bb61b147f11", "created_at"=>1444154512.150679, "enqueued_at"=>1444154512.151179, "error_message"=>"uninitialized constant SidekiqUniqueJobs::RunLockFailed", "error_class"=>"NameError", "failed_at"=>1444154512.161897, "retry_count"=>0, "error_backtrace"=>["/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:49:in rescue in while_executing'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:46:inwhile_executing'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:19:in call'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:inblock in invoke'", "/Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/active_record.rb:6:in call'"]} 21:01:52 sidekiq.1 | 2015-10-06T18:01:52.165Z 46364 TID-ovkrdiju8 WARN: NameError: uninitialized constant SidekiqUniqueJobs::RunLockFailed 21:01:52 sidekiq.1 | 2015-10-06T18:01:52.166Z 46364 TID-ovkrdiju8 WARN: /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:49:inrescue in while_executing'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:46:in while_executing' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-unique-jobs-4.0.0/lib/sidekiq_unique_jobs/server/middleware.rb:19:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/active_record.rb:6:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/retry_jobs.rb:74:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/logging.rb:11:inblock in call'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/logging.rb:30:in with_context' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/server/logging.rb:7:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:129:in block in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:132:incall'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/middleware/chain.rb:132:in invoke' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:51:inblock in process'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:98:in stats' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/sidekiq-3.5.0/lib/sidekiq/processor.rb:50:inprocess'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/calls.rb:28:in public_send' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/calls.rb:28:indispatch'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/call/async.rb:7:in dispatch' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/cell.rb:50:inblock in dispatch'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/cell.rb:76:in block in task' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/actor.rb:339:inblock in task'
21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/task.rb:44:in block in initialize' 21:01:52 sidekiq.1 | /Users/lena/.rvm/gems/ruby-2.2.1@sidekiq2/gems/celluloid-0.17.2/lib/celluloid/task/fibered.rb:14:inblock in create'
21:02:38 sidekiq.1 | 2015-10-06T18:02:38.709Z 46364 TID-ovkrl26yk HardWorker JID-e673a0b0b3be7bb61b147f11 INFO: start
21:02:38 sidekiq.1 | 2015-10-06T18:02:38.713Z 46364 TID-ovkrl26yk HardWorker JID-e673a0b0b3be7bb61b147f11 INFO: fail: 0.004 sec

@mhenrixon
Copy link
Owner

Sorry about that, should be good in 4.0.2

@phuongnd08
Copy link
Contributor

4.0.{0,1,2} is no longer working for me. Jobs are being duplicated all over the place, repeatedly.

@mhenrixon
Copy link
Owner

Already wrote you about that in the other issue. Check the reade like suggested :)

@elrakita
Copy link
Author

elrakita commented Oct 8, 2015

Hi all,

I have tried to verify the issue one more time for 4.0.2. I've run my HardWorker with sleep = 15 seconds and pressed "Ctrl+C" while the worker was executing. I've done this two times and one time it worked fine and the other time it did not work correctly.

Here is the log of successful shutdown (unique record was not present in Redis after this):

07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: start
07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: 2015-10-08 07:22:40 +0300
^CSIGINT received
07:22:51 system | sending SIGTERM to all processes
07:22:51 web.1 | terminated by SIGTERM
07:22:56 sidekiq.1 | exited with code 0

And here is the log of not successful shutdown (unique record was present in Redis after this).

07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes HardWorker JID-07c1f5c362e070cf634fec2c INFO: start
07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes HardWorker JID-07c1f5c362e070cf634fec2c INFO: 2015-10-08 07:23:39 +0300
^CSIGINT received
07:23:44 system | sending SIGTERM to all processes
07:23:44 web.1 | terminated by SIGTERM
07:23:49 system | sending SIGKILL to all processes
07:23:49 | Exiting
07:23:49 sidekiq.1 | 2015-10-08T04:23:44.591Z 48014 TID-ouvuq8934 INFO: Shutting down
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.494Z 48014 TID-ouvus5jto INFO: Terminating 24 quiet workers
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.498Z 48014 TID-ouvus5jto INFO: Pausing up to 8 seconds to allow workers to finish...
07:23:49 sidekiq.1 | terminated by SIGKILL

Here is also my Redis output for the second log:

127.0.0.1:6379> keys unique*

  1. "uniquejobs:748ff6fe808e4f19f12d94a2454e1a48"

Based on this, I suppose that the solution provided for unique_jobs in version 4.0.2 is not very stable. Could you please take a look?

Thanks,
Elena

@mhenrixon
Copy link
Owner

Not sure there is a whole lot I can do about the kill signal. What happens
is that foreman sidekiq gives each worker 5 seconds? to shut down
gracefully before sending the kill signal. The problem is that due to the
kill signal there is no way to handle any cleanup that I know of. This is
not a sidekiq unique jobs issue but a Linux process signal issue. It just
happens to appear to be a jobs issue since that is what makes it visible.
Maybe @mperham can suggest something?

On Thu, Oct 8, 2015, 6:34 AM Elena Rakita notifications@github.com wrote:

Hi all,

I have tried to verify the issue one more time for 4.0.2. I've run my
HardWorker with sleep = 15 seconds and pressed "Ctrl+C" while the worker
was executing. I've done this two times and one time it worked fine and the
other time it did not work correctly.

Here is the log of successful shutdown (unique record was not present in
Redis after this):

07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk
HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: start
07:22:40 sidekiq.1 | 2015-10-08T04:22:40.809Z 48011 TID-ouufr2xjk
HardWorker JID-3dcbd45eaba841cd6f9c9bf4 INFO: 2015-10-08 07:22:40 +0300
^CSIGINT received
07:22:51 system | sending SIGTERM to all processes
07:22:51 web.1 | terminated by SIGTERM
07:22:56 sidekiq.1 | exited with code 0

And here is the log of not successful shutdown (unique record was present
in Redis after this).

07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes
HardWorker JID-07c1f5c362e070cf634fec2c INFO: start
07:23:39 sidekiq.1 | 2015-10-08T04:23:39.279Z 48014 TID-ouvuwhjes
HardWorker JID-07c1f5c362e070cf634fec2c INFO: 2015-10-08 07:23:39 +0300
^CSIGINT received
07:23:44 system | sending SIGTERM to all processes
07:23:44 web.1 | terminated by SIGTERM
07:23:49 system | sending SIGKILL to all processes
07:23:49 | Exiting
07:23:49 sidekiq.1 | 2015-10-08T04:23:44.591Z 48014 TID-ouvuq8934 INFO:
Shutting down
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.494Z 48014 TID-ouvus5jto INFO:
Terminating 24 quiet workers
07:23:49 sidekiq.1 | 2015-10-08T04:23:45.498Z 48014 TID-ouvus5jto INFO:
Pausing up to 8 seconds to allow workers to finish...
07:23:49 sidekiq.1 | terminated by SIGKILL

Here is also my Redis output for the second log:

127.0.0.1:6379> keys unique*

  1. "uniquejobs:748ff6fe808e4f19f12d94a2454e1a48"

Based on this, I suppose that the solution provided for unique_jobs in
version 4.0.2 is not very stable. Could you please take a look?

Thanks,
Elena


Reply to this email directly or view it on GitHub
#112 (comment)
.

@mperham
Copy link

mperham commented Oct 8, 2015

If Foreman only gives 5 seconds to exit, you can't use a shutdown timeout of 8 seconds. Use -t 3.

@phuongnd08
Copy link
Contributor

Isn't it the lock will expire after a while? At least it is auto-healing?

@elrakita
Copy link
Author

elrakita commented Oct 8, 2015

@phuongnd08, yes, the lock is auto-healing by default (at least it was in version 3). @mhenrixon, could you please remind us what is default expiration time of the lock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants