Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expiration for all locks #393

Closed
JacobEvelyn opened this issue May 14, 2019 · 9 comments
Closed

Expiration for all locks #393

JacobEvelyn opened this issue May 14, 2019 · 9 comments
Assignees

Comments

@JacobEvelyn
Copy link

Is your feature request related to a problem? Please describe.

I find it confusing that lock expiration doesn't allow me to expire locks after some long period of time (for instance, 2 days) when using strategies other than until_expired. I'm worried that if something goes wrong and a lock isn't released (or a job is suddenly killed mid-execution and it has a runtime lock), those locks will linger forever and the same job can never again be enqueued.

Based on past issues it looks like this has happened before (for example: #277), so I think this concern is well-founded.

Describe the solution you'd like

I'd like lock_expiration to simply be the Redis TTL that all locks use, without any fanciness. (For backwards compatibility, this can also be called something else if that's clearer.) Honestly I think it even makes sense to have this behavior enabled by default for all jobs, since a lock that's been present for many days is probably the sign of a bug and should be released.

Describe alternatives you've considered

We're using Sidekiq Pro's super_fetch reliability feature which addresses the same issue from a different angle, but I think it's still valid to want locks to disappear after some long amount of time.

@mhenrixon
Copy link
Owner

The problems described is due to bugs in the scripts. It will not be a problem in the near future as there will be a cleanup running on Sidekiq.on(:heartbeat). The locking mechanism is also getting some serious love in #387 and will further reduce this issue.

I'll hold off on your suggestion until that work is complete but I will leave this issue open until we see if the problem is solved in a better way or not.

@mhenrixon
Copy link
Owner

This should be fixed now that #387 got merged. No release yet, still have a few things to work through first.

@amitsaxena
Copy link

@mhenrixon this has been happening a lot to us on Production recently and is breaking our crons (which use https://github.com/ondrejbartas/sidekiq-cron). It will be nice to disregard the lock after a specific amount of time and make it configurable. What's the recommended way to achieve this, instead of manually deleting the locks? If there isn't a way, I'll have to write a cron that looks at redis for the keys and clears out old lock keys. This has been giving us too much pain recently.

@mhenrixon
Copy link
Owner

@amitsaxena though it might not be a global setting lock_expiration is used for this purpose. In v7 there is also the orphan reaper that cleans out stale locks.

@amitsaxena
Copy link

amitsaxena commented Nov 21, 2019

Ahh! That seems to be missing from master README. But anyways @mhenrixon I tried it, and it doesn't seem to work as expected. We have this in the job:
sidekiq_options queue: :subscription, retry: false, lock: :until_executed, lock_expiration: 1.minute.to_i, on_conflict: :raise

So I am expecting if there is a lock in redis for more than 1 minute and I schedule a new job, it should get added to sidekiq. It doesn't seem to happen and an exception is raised instead. I tried with the job and then deleted the job directly from redis so that the lock still stays (and is orphaned), but still the new job doesn't get added. We are using v6.0.15.

@aharpervc
Copy link

@amitsaxena though it might not be a global setting lock_expiration is used for this purpose. In v7 there is also the orphan reaper that cleans out stale locks.

I'm a little confused, can you clarify how to use lock_expiration in v6? Aside from this feature, is there a way to find/expire orphaned locks in v6 at all?

@amitsaxena
Copy link

@aharpervc I couldn't find anything which works (in v6), so I had to write a script which looks for redis keys (related to locks) and deletes them.

@aharpervc
Copy link

@aharpervc I couldn't find anything which works (in v6), so I had to write a script which looks for redis keys (related to locks) and deletes them.

Sounds useful, are you able to share it?

@amitsaxena
Copy link

amitsaxena commented Sep 5, 2020

Something on the lines of:

class CleanStaleLocks
  include Sidekiq::Worker

  def perform
    SidekiqUniqueJobs::Digests.all.each do |digest|
      delete(digest) if stale?(digest)
    end
  end

  private

  def delete(digest)
    $redis.del("#{digest}:EXISTS")
    $redis.hgetall("#{digest}:GRABBED").keys.each do |k|
      $redis.hdel("#{digest}:GRABBED", k)
    end
    # Check for the correct key name in redis and use below
    $redis.srem("project_name:sidekiq:unique:keys", digest)
  end

  def stale?(digest)
    jid = $redis.get("#{digest}:EXISTS")
    return false if Sidekiq::Queue.new("queue_name_with_unique_jobs").find_job(jid)

    grabbed = $redis.hgetall("#{digest}:GRABBED")
    raise "Expected redis key not found" if grabbed.nil?

    locked_at = Time.at(grabbed[jid].to_f)
    # assuming lock is stale if it is older than 1 hour
    (Time.now - locked_at) > 1.hour.to_i
  end
end

You will need to tweak the code depending on what queues could have unique jobs in your case. I'd then add it as a cron job running every hour (or more frequently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants