-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #282
Comments
thanks for the report, it is going to take some time to verify this. One think that sounds suspicious is that you mention the leak is in the worker but only when adding jobs. What if you only add jobs, does it not leak memory in that case? also, regarding this line:
Does it leak memory even with that line out-commented? (in the past we had problems with that function. I did a quick analysis and it looks fine though). |
To check whether or not it leaks should be easy using the incremental snapshots. When opening them, we can see that the 1MB result strings are never collected, and have their distance gradually increase in the promise reaction chain. (see screenshot)
My wording was probably poor. I meant that it is only leaking when the worker itself is adding jobs. If the master were to execute the same code, no leak would occur.
Commenting the call to |
Ok, then the leak is in that function as I suspected. I will review it deeper to see how this can happen. |
I did a small fix here that I think could fix this issue: #284 |
Seems like the problem isn't in |
If it is removed by the GC then it is not a leak by definition. |
I agree it's not a "leak" per se. But there is still a memory accumulation somewhere that shouldn't be here. I looked into heapdump a little more and I found out that Jobs are kept because they are used here Between the start and the end of the test, Jobs are only created, none are reclaimed (I also forced GC run to be sure) probably because of Promises (promises are not reclaimed too). I don't really know why though. |
In my case I had a worker that was acting as an aggregator. |
is it possible to simplify the test further, for example there is a jobOne and jobAll, etc. If it can be simplified to the absolutely minimum that reproduces the leak it would be easier to pinpoint where it comes from. |
So I think I found the reason behind this leak: When linstening to These events comes from here: Line 138 in 4262837
The I wrote a simple fix that uses indexes to find the original promise instead. It seems to fix the issue --- a/src/classes/worker.ts (revision 347a4551be8e8b4c3e222d011fef5eb9afcc4711)
+++ b/src/classes/worker.ts (date 1602595125126)
@@ -134,9 +134,12 @@
* Get the first promise that completes
* Explanation https://stackoverflow.com/a/42898229/1848640
*/
- const [completed] = await Promise.race(
- [...processing.keys()].map(p => p.then(() => [p])),
+ const promises = [...processing.keys()];
+ const completedIdx = await Promise.race(
+ promises.map((p, idx) => p.then(() => idx)),
);
+
+ const completed = promises[completedIdx];
const token = processing.get(completed);
processing.delete(completed); |
I found this issue related to |
yeah, that may be it. Fortunately there is a safeRace implementation that does not have the leak. Seems like it is not so trivial to just replace the Promise.race by something else that does the same. |
I think my fix will prevent most of OOM as it keeps a single number (with a small footprint) and doesn't imply much changes. On the other hand, if you prefer fixing the real issue behind this, using a better |
looks great actually, and I think it is easier to understand than previous implementation. |
…10-20) ### Bug Fixes * **job:** remove listeners before resolving promise ([563ce92](563ce92)) * **worker:** continue processing if handleFailed fails. fixes [#286](#286) ([4ef1cbc](4ef1cbc)) * **worker:** fix memory leak on Promise.race ([#282](#282)) ([a78ab2b](a78ab2b)) * **worker:** setname on worker blocking connection ([#291](#291)) ([50a87fc](50a87fc)) * remove async for loop in child pool fixes [#229](#229) ([d77505e](d77505e)) ### Features * **sandbox:** kill child workers gracefully ([#243](#243)) ([4262837](4262837))
Thank you, I confirmed the fix in the PR was solving the issue, thus closing this. |
I have noticed that the jobs are leaking on the worker side, when a worker queues jobs on that worker queues.
For clarity, if the worker object is constructed with
new Worker('oom-test')
and tries to donew Queue('oom-test').add(...)
, these jobs will leak.However, if the worker queues job using a different queue (e.g.
new Queue('oom-test-one').add(...)
, then it no longer leaks.In order to reproduce, see the example project attached.
npm install
node master.js
, in anothernode worker.js
. Inspect the snapshots using chrome, and verify they are not leaking.node master.js
, in anothernode worker-leak.js
. By inspecting the snapshots using chrome, you can notice that the 1MB strings have never been collecting and have an increasing distance.bullmq-leak.zip
The text was updated successfully, but these errors were encountered: