Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite lock using until_and_while_executing after sidekiq restart #361

Closed
mainameiz opened this issue Dec 29, 2018 · 3 comments
Closed

Comments

@mainameiz
Copy link

Describe the bug
until_and_while_executing does not on sidekiq restarts.

Expected behavior
I expect that job will be retried when sidekiq started after normal restart.

Current behavior
There are some lock keys in redis that prevents job from being retries.

[8] pry(main)> Sidekiq.redis { |r| puts r.keys('uniquejob*').sort; }; nil
uniquejobs:f46bc25dd7800206da7159bd516aa7e1:AVAILABLE
uniquejobs:f46bc25dd7800206da7159bd516aa7e1:RUN:EXISTS
uniquejobs:f46bc25dd7800206da7159bd516aa7e1:RUN:GRABBED

Worker class

class TestWorker
  include Sidekiq::Worker

  # both does not work
  sidekiq_options queue: :default, unique: :until_and_while_executing, retry: true
  # sidekiq_options queue: :default, unique: :until_and_while_executing, retry: false

  # this callback is not called
  sidekiq_retries_exhausted do |msg, _ex|
    Rails.logger.info "sidekiq_retries_exhausted: #{object_id} msg['unique_digest']: #{msg['unique_digest'].inspect}"
    SidekiqUniqueJobs::Digests.del(digest: msg['unique_digest']) if msg['unique_digest']
  end

  def perform
    Rails.logger.info "started: #{object_id}, sleeping 40s"
    sleep 40
    Rails.logger.info "finished: #{object_id}"
  end
end

Additional context

Jobs are enqueued using perform_async.

`config/initializers/sidekiq.rb

Sidekiq.configure_server do |config|
  ....

  config.death_handlers << ->(job, _ex) do
    Rails.logger.info "death_handlers: #{object_id} job['unique_digest']: #{job['unique_digest'].inspect}"
    SidekiqUniqueJobs::Digests.del(digest: job['unique_digest']) if job['unique_digest']
  end
end
@blarralde
Copy link

Confirming that jobs sometimes get stuck in locked state even after the last update.

@snovity
Copy link

snovity commented Jan 15, 2019

Was just affected by this, jobs wouldn't enqueue even when there is no job in the queue or running. Had to remove all until_and_while_executing options. Was using it on periodical jobs, hence unique key was reused. Worried there is a bigger issue with not releasing any locks, it's just their unique hashes do not repeat and the problem is not visible.

@elliotb
Copy link

elliotb commented Jan 29, 2019

We're also seeing this issue after upgrading from 6.0.4 to 6.0.8. Jobs (with until_and_while_executing) will enqueue for the first time, but then queueing them again with perform_async returns nil even after they have finished executing.

It seems that when this occurs, SidekiqUniqueJobs::Digests.all returns an empty array, but Sidekiq.redis { |r| r.keys } returns many keys prefixed with uniquejobs:. Similarly the Sidekiq Web extension will list 0 digests.

We can temporarily allow jobs to be re-queued again by running Sidekiq.redis { |r| r.flushall } to completely clear our queues and unique job locks, but then the issue will re-occur after the initial jobs complete.

Edit:

It appears to work correctly in v6.0.8 if we remove the lock_expiration options from the workers.
From an initial investigation, it appears that the unlocking code here:

if expiration then
takes different paths depending on whether expiration is enabled or not. In the case that it isn't, it also removes "legacy" keys, which seem to be the ones causing the issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants