Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique job keys never dissapear #161

Closed
kpheasey opened this issue Feb 4, 2016 · 14 comments
Closed

Unique job keys never dissapear #161

kpheasey opened this issue Feb 4, 2016 · 14 comments

Comments

@kpheasey
Copy link

kpheasey commented Feb 4, 2016

I'm using unique jobs with Sidekiq Pro (reliability enabled) and ActiveJob and have setup my initializer according to the README.

Most of my jobs are created with sidekiq, however, there is a large portion that comes form ActiveJob.

The uniquejob keys are not removed from redis.

Notice the large memory usage in the screenshot below. There are no jobs in the queue
screen shot 2016-02-04 at 9 35 19 am

When looking threw keys in Redis, I see a lot of

my_namespace:uniquejobs:c6af61b6571bc422a78f6f3d043a2635
my_namespace:uniquejobs:38cfda2fd8f924da936175dde391dd3d
my_namespace:uniquejobs:4bff3c9a5b97670209a509c6d5dd95a5
my_namespace:uniquejobs:38cfda2fd8f924da936175dde391dd3d
@kpheasey
Copy link
Author

kpheasey commented Feb 4, 2016

After looking further, I discovered that it's the uniquejobs hash in redis that takes up the majority of the memory usage.

@mhenrixon
Copy link
Owner

For a quick fix, I've implemented a

A what now? :) Did you see you can clear the jobs by command line or console?

@kpheasey
Copy link
Author

kpheasey commented Feb 4, 2016

Sorry, didn't finish my thought.

I believe the problem is because the of sidekiq pro reliability and an autoscaling environment.

Reliability puts jobs in private queues. When the job is executed, the unique job is unlocked. However, when autoscaling brings down servers, the sidekiq process dissapears and the private queue is not processed. To get around that, there is a second process that finds old private queues and puts the jobs back into the main queue with RPOPLPUSH. Finally, when the job is executed by another sidekiq process, the unique job is not unlocked.

I believe there may need to be another unique: :unti_l* type of :unitl_reliably queued. This will unlock the unique job when the job has been moved to a private queue. However, this seems very environment and application specific, so I will build it on my own.

Closing the issue. Let me know if you want me to make a pull request.

@kpheasey kpheasey closed this as completed Feb 4, 2016
@warmwaffles
Copy link

@kpheasey I'm interested in the solution you came up with. We are experiencing similar issues.

@mhenrixon
Copy link
Owner

While waiting for a fix you can use the console or commandline app to clear the keys.

@warmwaffles
Copy link

Thanks @mhenrixon, if a fix is found, can you post here as well?

@mhenrixon
Copy link
Owner

Sure thing! The only solution that I can see is if I could get access to the pro source code to see how to hook into it but I guess @mperham wouldn't just hand that out so my hands are a little tied on that matter.

@ropiku
Copy link

ropiku commented Feb 10, 2016

@mhenrixon If you bought pro then you have access to it: bundle show sidekiq-pro.

@mperham
Copy link

mperham commented Feb 10, 2016

The unique jobs implementation in Sidekiq Enterprise requires a TTL to ensure data is expired quickly for exactly this reason: sidekiq_options unique_for: 10.minutes

https://github.com/mperham/sidekiq/wiki/Ent-Unique-Jobs#use

I'm not sure I understand the issue to advise you. There is nothing to hook into when reliabling enqueuing since it uses a single atomic Redis command, RPOPLPUSH.

@mhenrixon
Copy link
Owner

@ropiku I am not buying an enterprise license to maintain an open source gem. That economy doesn't make sense to me but if you buy a license for us by all means go ahead :)

Thanks @mperham!

@warmwaffles @kpheasey would sidekiq_options unique_for: 10.minutes help you at all?

@ropiku
Copy link

ropiku commented Feb 10, 2016

Sorry I mistook you with kpheasey who was saying is running pro. Will give this a try.

@warmwaffles
Copy link

would sidekiq_options unique_for: 10.minutes help you at all?

Unfortunately, we only have Pro so I don't think uniqueness comes with that gem.

@kpheasey
Copy link
Author

@warmwaffles I haven't gotten a chance to implement the solution yet. I believe it's possible to override a method Sidekiq::Pro::ReliableFetch.retrieve_work() or Sidekiq::Pro::ReliableFetch::Retriever to unlock the the job at that point in time. Similar to the overrides here, https://github.com/mhenrixon/sidekiq-unique-jobs/blob/9f184aacebe2d9395eef3c0ca84f89f07972c2e1/lib/sidekiq_unique_jobs/sidekiq_unique_ext.rb

@kpheasey
Copy link
Author

@warmwaffles I don't the resources currently to look into implementing a new lock type that would unlock jobs when they have been fetched for a private queue.

We have a lot of pre-calculated data that relies on outside sources which are constantly changing. Previously we created a job to re-calculate data when the outside data changed.

Our solution involved creating a small, non-unique, job to mark the calculations as expired. Then a second, unique, job will re-calculated. This ensures that we correctly expire the cache and can know what calculations are needed, even if the unique job is not unlocked correctly because the state is saved to the database.

Since the problem only happens after the environment has been scaled down. We clear the unique job locks when we merge the stale private queues. Here's a gist with our rake task for doing so; https://gist.github.com/kpheasey/9c9255c4ce20beeabde0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants