You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating #2692, @DavisVaughan discovered this thread discussing synchronisation issues within tower-lsp, the LSP server implementation used in Ark: ebkalderon/tower-lsp#284.
The summary is that while message ordering is preserved by tower-lsp at all times, because handlers are async, the ordering is likely to be compromised by yielding with .await. And we call .await a lot, for instance at handler entry to log a message in our LSP output. This causes two main issues:
Stale state. A did_change_ state-changing notification might be invoked and grab the document lock before a refactoring/analysis request had a chance to read state. When this request is yielded control back, suddenly the world has changed and the cursor positions or ranges that the request is working on have become stale. We think this explains crashes like Ark: LSP crash while looking for completions or creating new document context #2692.
Unordered responses. The LSP server should try its best to preserve ordering of messages, and it's even required to do so when the messages might change the document. Because of the concurrency of async handlers, this ordering is not guaranteed, causing the server and client states to drift apart.
To fix (1), we should synchronise our handlers on entry with a read-write lock. To fix (2), we should synchronise them on exit with some sort of queue.
while most of the requests can be handled asynchronously, a minority of requests are "real time", in that they block user's input (eg, selectionRanges). For such requests, you want to handle them "on the main loop" as soon as possible, like you'd do for state update notifications.
Based on our new understanding of all this, I did another analysis of how we could possibly be getting supposedly "out of order" did_change() requests - which is what forced me to put that pending changes queue in place. I think it can be described as the following series of events:
Client sends did-change request A
Client sends did-change request B
tower-lsp forwards did-change request A
tower-lsp forwards did-change request B
Everything is fine up to this point
did_change(A)
Immediately calls backend_log() before locking the doc
await the backend_log()
The await allows for switching to did_change(B) immediately and we pause did_change(A)
did_change(B)
Immediately calls backend_log() before locking the doc
await the backend_log()
Assume we stay in did_change(B) this time instead of switching back to did_change(A)
Takes the lock on the doc
Tries to process the change, but it looks out of order! Saves B as a pending change.
did_change(A) finally gets control back
We process change A and the saved one from B
It's worth noting that if we had locked the docbefore calling backend_log(), which is a totally reasonable thing to think you could do, then this would deadlock in this case (which we have seen many times in other scenarios), because when it switches to B and B tries to take the lock on the doc, it won't be able to until A releases it, but A can't release it until it gets control back. We are super careful about this in the diagnostics code, for example.
While investigating #2692, @DavisVaughan discovered this thread discussing synchronisation issues within tower-lsp, the LSP server implementation used in Ark: ebkalderon/tower-lsp#284.
The summary is that while message ordering is preserved by tower-lsp at all times, because handlers are async, the ordering is likely to be compromised by yielding with
.await
. And we call.await
a lot, for instance at handler entry to log a message in our LSP output. This causes two main issues:Stale state. A
did_change_
state-changing notification might be invoked and grab the document lock before a refactoring/analysis request had a chance to read state. When this request is yielded control back, suddenly the world has changed and the cursor positions or ranges that the request is working on have become stale. We think this explains crashes like Ark: LSP crash while looking for completions or creating new document context #2692.Unordered responses. The LSP server should try its best to preserve ordering of messages, and it's even required to do so when the messages might change the document. Because of the concurrency of async handlers, this ordering is not guaranteed, causing the server and client states to drift apart.
To fix (1), we should synchronise our handlers on entry with a read-write lock. To fix (2), we should synchronise them on exit with some sort of queue.
This comment from the author of rust-analyzer is also relevant: https://reddit.com/r/rust/comments/tbsiv5/towerlsp_0160_lightweight_framework_for_building/i0exnbi/
Worth noting that the ruff project recently decided (astral-sh/ruff#10158) to switch from tower-lsp to lsp-server (https://github.com/rust-lang/rust-analyzer/tree/master/lib/lsp-server), the framework created for rust-analyzer. This seems like a lot of work but allows greater control over the scheduling of handlers and avoids the tokio runtime.
The text was updated successfully, but these errors were encountered: