Skip to content
This repository has been archived by the owner on Aug 7, 2023. It is now read-only.

Handle worker crashes #1073

Merged
merged 4 commits into from
Jan 17, 2018
Merged

Handle worker crashes #1073

merged 4 commits into from
Jan 17, 2018

Conversation

IanVS
Copy link
Member

@IanVS IanVS commented Jan 12, 2018

Refs #925 #927 #930 #1059

This is an attempt to mitigate some of the problems being experienced in the referenced issues, whereby linter-eslint crashes/hangs with a high memory usage.

I was unable to determine the exact cause of workers dying, but this PR does a few things to try to minimize the crashes and memory usage, as well as allowing linter-eslint to recover from a crashed worker. In one of my projects, I was able to reliably reproduce the crashes, usually within a few seconds of opening a file. Using this branch, linter-eslint has now become usable for me again, albeit during a crash it still takes a few seconds to recover and start working again, so I don't consider this a complete solution.

I think the main challenge we are facing is that the Task api from Atom is not intended to be used as a long-running task, which is what we need because starting up a lint job can take a second or two. Maybe we can find another, more reliable approach in the future, but hopefully this PR will at least make linter-eslint usable again for those folks currently stuck on the old version.

@IanVS IanVS requested a review from Arcanemagus January 12, 2018 14:48
Copy link
Member

@Arcanemagus Arcanemagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this change, just a few cleanups I noticed.

src/main.js Outdated
resolve()
}
// Initialize the worker during an idle time
window.requestIdleCallback(initializeESLintWorker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added to the idelCallbacks Set (The call up on L185 also should be added). Otherwise we risk initializing a worker when the package is deactivated.

src/main.js Outdated
this.worker.terminate()
this.worker = null
}
const initializeESLintWorker = () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code is in multiple places it should be moved to a separate function of its own.

@@ -323,6 +344,13 @@ module.exports = {
await waitOnIdle()
}

// Sometimes the worker dies and becomes disconnected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is duplicated maybe this comment as well as the check should be moved into the called function, and it renamed to something like checkESLintWorker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on it, and in the end decided against passing yet another argument to sendJob, but can do so if you'd prefer that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually rather like the idea of just sticking this logic in sendJob, that way the logic for maintaining the worker stays there leaving the callers to just worry about how the job is used.

I'm not sure why moving the logic there would require an additional argument though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to kill and restart a worker simply from a reference to that worker? You can see that my restartESLintWorker() method sets this.worker, which is then passed to sendJob. So I assume we would need to send a reference to this.restartESLintWorker to sendJob as well. Or am I missing another approach that you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this bit:

if (this.worker && !this.worker.childProcess.connected) {
  await this.initializeWorker()
}

As well as the code in initializeWorker could be moved into sendJob (or called from there).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess I'm not seeing a clear vision of what you have in mind. Do you want to take a crack at a commit on this PR?

Copy link
Contributor

@skylize skylize Jan 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is at all an improvement. A lot of side effects and misdirection, but I'll share the idea anyways. Maybe it will spark something.

const advancedWorker = () => ({
  initialize: () =>
    this.initializeEslintWorker().then(() => this.worker),
  restart: () =>
    this.restartEslintWorker.then(() => this.worker),
  task: this.worker
})

async function sendJob( worker, config) {
  if (!worker.task || !this.worker.childProcess.connected) {
    worker.task = await worker.initialize()
  }
  // ....
}
```

src/worker.js Outdated
// We catch all worker errors so that we can create a separate error emitter
// for each emitKey, rather than adding multiple listeners for `task:error`
try {
const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but I'd leave this out of the try/catch and just use emitKey in the catch block.

@@ -43,12 +43,17 @@ export async function sendJob(worker, config) {
config.emitKey = cryptoRandomString(10)

return new Promise((resolve, reject) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a promise that can never resolve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, resolve was hidden in folded code. 🙈

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the reject on L57, and the resolve on L62 are for?

src/main.js Outdated
idleCallbacks.forEach(callbackID => window.cancelIdleCallback(callbackID))
idleCallbacks.clear()
helpers.killWorker()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is after the cancellation of the idle callbacks to prevent the idle callback that requests a worker start from firing off before it would be able to kill it.

src/helpers.js Outdated
@@ -83,7 +103,7 @@ function validatePoint(textBuffer, line, col) {
}
}

export async function getDebugInfo(worker) {
export async function getDebugInfo() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Gets a ton of the worker logic and state management of the worker out the the main object, greatly simplifying things. Definitely going the right direction IMHO.

@Arcanemagus Arcanemagus force-pushed the handle-worker-crashes branch from 9491546 to 6850987 Compare January 16, 2018 19:29
Copy link
Member

@Arcanemagus Arcanemagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this as is, the refactor work I was thinking of turned out to be large enough that it should be a separate PR.

IanVS added 4 commits January 16, 2018 14:45
This helps to avoid an explosion in the number of listeners once 
errors start to happen.
There are some times when the worker dies completely and becomes
disconnected.  When that happens, we cannot send another job until
we terminate the failed worker and start up a new one.
Shitty commit title, but this does a few things:

 - Breaks out the worker initialization into a separate method
 - Adds the idle callback for the worker initialization to `idleCallbacks`
 - Does away with `restartEslintWorker` and just adds the safety check
of terminating an existing `this.worker` within `initializeWorker`
 - Destructures `jobConfig` outside of the try/catch, to use within
the error handler.
@IanVS IanVS force-pushed the handle-worker-crashes branch from 4941417 to b4b26a1 Compare January 16, 2018 19:45
@IanVS IanVS merged commit 64f4c80 into master Jan 17, 2018
@IanVS IanVS deleted the handle-worker-crashes branch January 17, 2018 11:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants