Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: revise server architecture #1884

Merged
merged 1 commit into from
Nov 28, 2022

Conversation

Kha
Copy link
Member

@Kha Kha commented Nov 26, 2022

Replace complex debouncing logic in watchdog with simple IO.sleep in worker didChange handler, replace redundant header change logic in watchdog with special exit code from worker.

This should not yet introduce noticeable changes in behavior, but apart from drastically simplifying the code, it does open up future opportunities such as not restarting the worker until the new header is actually valid, which might help with implementing import completion.

Replace complex debouncing logic in watchdog with single `IO.sleep` in
worker `didChange` handler, replace redundant header change logic in
watchdog with special exit code from worker.
@Kha Kha requested a review from gebner November 26, 2022 13:42
@gebner gebner merged commit a38bc0e into leanprover:master Nov 28, 2022
github-merge-queue bot pushed a commit that referenced this pull request Aug 7, 2024
This PR resolves two language server bugs that especially affect Windows
users:
1. Editing the header could result in the watchdog not correctly
restarting the file worker (#3786, #3787), which would lead to the file
seemingly being processed forever.
- The cause of this issue was a race condition in the watchdog that was
accidentally introduced as far back as #1884: In specific circumstances,
the watchdog will attempt forwarding a message to the file worker after
the process has exited due to a changed header, but before the file
worker exiting has been noticed by the watchdog (which will then restart
the file worker). In this case, the watchdog would mark the file worker
as having crashed and not look at its exit code to restart the file
worker, but instead treat it like a crashed file worker that will only
be restarted when editing the file again. Not inspecting the exit code
of the file worker when it crashed from forwarding a message from the
file worker is necessary since we do not restart the file worker until
another notification from the client arrives, and so we would read the
same crash exit code over and over again in the main loop of the
watchdog if we did not remove it from our list of file workers that we
listen to.
- This PR resolves this issue by distinguishing between "crashes when
forwarding messages to the file worker" and "crashes when forwarding
messages from the file worker". In the former case, we still inspect the
exit code of the file worker and potentially restart it if the imports
changed, whereas in the latter case, we stop inspecting the exit code of
the file worker. This is correct because the latter case is exactly the
one where we need to stop inspecting the exit code but where a crash
cannot occur as a result of a changed header, whereas the former case is
exactly the one where we still need to inspect the exit code after a
crash to ensure that we restart the file worker in case it exited
because the header changed.
- At some point in the future, it would be nice to revamp the
concurrency model of the watchdog entirely now that we have all those
fancy concurrency primitives that were not available four years ago when
the watchdog was first written.

2. On an especially slow Windows machine, we found that starting the
language server would sometimes not succeed at all because reading from
the stdin pipe in the watchdog produced an EINVAL error, which was in
turn caused by an NT "pipe empty" error.
- After lots of debugging, @Kha found that Lake accidentally passes its
stdin to Git because it does not explicitly set the `stdin` field to
`null` when spawning the process.
- Changing this fixes the issue, which suggests that Git may mutate the
pipe we pass to it to be non-blocking, which then causes a "pipe empty"
error in the watchdog when we also attempt to read from that same pipe.
- I'm still very uncertain why we only saw this issue on one
particularly slow machine and not across the whole eco system.

This PR also resolves an issue where we would not correctly emit
messages that we received while the file worker is being restarted to
the corresponding file worker after the restart.

Closes #3786, closes #3787.

---------

Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants