-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory bloat / dangling keys / reaper not cleaning orphans #637
Comments
Is it possible, that bacause of this bloat, the newly generated random uids were actually seldomly the same as some of the existing orphaned keys? We've had roughtly 6,5 million of these orphaned keys when I was removing them. It sounds not very likely that it would contribute to 4% of cases, but after the upgrade to 7.1.5 the number of jobs lost was around 0.3%. |
We've disabled locks on the worker that was problematic and we're not having issues with missed jobs anymore. However we must now come up with an alternative strategy until this gets fixed (any clues about whether it would? 🙂 ) or permanently. Not mentioning having to fix/recalculate nearly all of our historical records in different parts of the system, which will take days... 😬 |
Sorry @sharq1, life got in the way this summer: #601 (comment) I'll be dealing with these topics in the next few days |
I'm sorry to hear that... Hope the bad luck is over for you. |
@sharq1 are you on Sidekiq 6.2.2? If so, they renamed the |
Yes, however we've switched to 6.2.2 from 6.2.1 at around the same time when we disabled locks for some of the jobs (workers) - but for some we still have locks. Will the |
Yes but if you for example delete things in Sidekiq::API classes (uses by sidekiq/web) then related locks won't be deleted. |
V7.1.6 should work with latest sidekiq version. |
Hi @sharq1 I will have a deeper look into this later today. |
So we pretty much have the same issue on our end. If one of our sidekiq job crashes (OOM in our case) then the Versions: |
@bdarcet I strongly recommend you upgrade to 7.1.8 because 7.1.5 is not compatible with the sidekiq version you are using. |
There was some work being done in #724 which will help greatly for you as well. |
When #725 is merged it would be great to get some feedback on how it performs for you @fotinakis. Any insights into this would be greatly appreciated. Preferably before I release it as we changed how the expired keys are stored. |
Amazing, thanks @mhenrixon! I'm going to be able to run a large-scale test today, and will report the results to you here. |
Test results: unfortunately, saw similar results for memory leaks under load with Setup:
sidekiq_options lock: :until_expired, lock_ttl: 30.minutes
Result:
-- Calling the reaper directly reports nothing to delete: > POOL2.with { |r| puts SidekiqUniqueJobs::Orphans::Reaper.call(r) };
2022-07-12T20:42:09.000Z pid=14081 tid=12wt INFO: Nothing to delete; exiting.
2022-07-12T20:42:09.001Z pid=14081 tid=12wt INFO: Nothing to delete; exiting. (one complexity to note: we do use sidekiq sharding to support multiple redis dbs, but I don't think that causes any additional issues here) -- Any ideas? (I don't know enough about how sidekiq-unique-jobs's internals work, but happy to help debug or test again) |
Quick followup: It looks like like the keys with So the only orphans of concern above are the |
Awesome! I'll get on those lingering keys then! |
Great, thanks @mhenrixon! Do you have any thoughts on an ETA for fixing this – we're using at such a scale that I'm somewhat blocked on turning on a big system until this particular issue is resolved or we work around it. We've also just joined as as a GitHub sponsor! 💯 Really appreciate your work on this fantastic open source project. |
@fotinakis I'll get started first thing on Tuesday. This week I have off to spend some quality time with my wife. Our kids are in Sweden with my parents and yesterday was our fifth wedding anniversary! Sincerely appreciate the support and the kind words 🙏. Gets me very excited about fixing problems ❤️ |
@fotinakis Monday I'll get started! Just had a look in the calendar and my birthday isn't until Tuesday so have the whole of Monday to look at those dangling keys. |
Hey @mhenrixon — appreciate it. No rush from our end, we have implemented a workaround in the meantime. 👍 Will be good to fix that orphan cleanup whenever you get to it. |
@fotinakis I have a PR (#726) that should hopefully fix the problem. Would appreciate help battle testing the change. I am a little unsure about the potential effect of concurrency here. |
Please checkout v7.1.26 https://github.com/mhenrixon/sidekiq-unique-jobs/releases/tag/v7.1.26 as it has quite a few fixes for until and while executing and also until expired |
Version v7.1.27 should be even better, especially if you are using redis namespace. That said, redis namespace is not recommended. It was discovered it is not up to date with redis-rb's keyword arguments. Who knows what else is failing silently with it. |
@fotinakis similar throughput as you and similar issues. we're on 7.1.27. what were your workarounds? i woke up to a nightmare :( |
Describe the bug
The
uniquejobs:...
keys do not get removed from Redis, thus over few months it took almost all of our Redis memory (500MB). See how it grew over 9 weeks:I have just manually removed all keys matching
uniquejobs:*
and it released nearly all the memory that was being used:Expected behavior
No
uniquejobs
keys are left dangling in Redis once there are no enqueued jobs.Current behavior
Lots of dangling keys are left in Redis.
Additional context
We were on version 7.0.1 for a while and switched to 7.1.5 just days ago.
Here's our Sidekiq initilizer:
Some dangling keys found when no jobs were enqueued:
Possibly related issue
We've also been having issues with
until_and_while_executing
lock - on 7.0.1 we discovered that around 4% of our jobs did not perform because of this. On 7.1.5 it looks like it's better, but still there are some that are missed. I've added additional logging to determine the cause, but given all the issues I'm afraid we'll have to stop / pause using the gem until it's stable.However I really appreciate your hard work and please let me know if I can be of any help with determining the cause of issues.
The text was updated successfully, but these errors were encountered: