-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ditching concurrent handler execution #284
Comments
I have found some interesting information regarding the Language Server Protocol's consistency requirements that could help inform our future design moving forward. First, the specification describes the correct server behavior of
This is obvious and makes intuitive sense. We must process text document change notifications strictly in the order in which they are received to ensure state consistency on the server side. We do indeed process these events in order, thanks to For reference, |
Hey! Thanks for working on this project. It's super helpful. I wanted to add some example use cases to this. One thing we want to implement is a custom request that can be used to interrogate the LSP client. For example, suppose we want to ask the client to fetch all workspace symbols across all language. Then we end up with the following flow: . Source. Relevant to this discussion, you can see that within handling the initial request (id=0), a few things happen:
So basically, there's a point in time where there are three pending futures on the server: id=0: the original request from the client And these need to be handled in reverse order. I have this implemented, and it was blocking until I made this change. Which is probably what you would expect to happen: diff --git a/src/transport.rs b/src/transport.rs
index 0b4a647..69c4802 100644
--- a/src/transport.rs
+++ b/src/transport.rs
@@ -95,11 +95,12 @@ where
let mut framed_stdin = FramedRead::new(self.stdin, LanguageServerCodec::default());
let framed_stdout = FramedWrite::new(self.stdout, LanguageServerCodec::default());
- let responses = receiver.buffered(4).filter_map(future::ready);
+ let responses = receiver.buffer_unordered(4).filter_map(future::ready);
let interleave = self.interleave.fuse(); Does this make sense? Or am I using custom request wrong? Because without the above change, I'm struggling to see how sending a server->client request could ever make progress? |
I agree with @samscott89 that having concurrency should probably be possible. However it would probably be nice for some language servers to have no concurrency at all, because that might make the whole thing a bit simpler. Most language servers will not be able to do a lot concurrently anyway. Calculating diagnostics (which is typically the most expensive operation) can usually not be done at the same time as calculating completions, because they access the same data structures. Although, most of the time caches can be used to avoid recalculating everything. So IMO unless you have a more fancy architecture like @samscott89 you probably want no concurrency at all and this should probably be the default. However I don't see why users should not also be able to work with concurrency. |
Thanks for the suggestions and helpful discussion! However, there is unfortunately trouble associated with the use of One possible solution that could solve the problem while preserving the specification invariants might be, as described in the issue description, to have two separate channels for server-to-client and client-to-server communication. Each of these channels would use This still doesn't solve the data integrity issues caused by allowing servers to have concurrent handlers (which is the original subject of this issue), but it may at least fix the blocking problems @samscott89 described caused by client-to-server and server-to-client requests accidentally blocking on each other, I think. I do like @davidhalter's suggestion of making the buffered concurrency configurable by the user and possibly disabled by default, though. Perhaps that may be the best route forward: come out of the box with concurrent processing of client-to-server and server-to-client messages, but each direction processing handlers sequentially in the order in which they were received. |
This is not how I understand the spec. The spec says:
I understand that this means that reording/rename requests may not be reordered, but I would say that even there there are exceptions: For example a local definition lookup could in theory be reordered with a rename in a different part of the code. That is still correct behavior according to the specification. What do you think? |
I agree with @davidhalter's reading of things. But even so
sounds like a brutal invariant to attempt to support generically! So even if technically it should be allowed, I could see a good argument for not supporting it in a higher-level framework like As far as I can tell, your proposed solution would achieve that:
with that in place, you would see:
|
Sounds like it should be non-concurrent by default with opt-in to concurrency? I can't imagine there are many extensions that would really benefit from concurrency anyway. |
That's certainly a possibility, @Timmmm. Though it would be a shame to disallow scenarios such as @samscott89 described in this server. I do feel like this use case should be better supported by I think I've devised a vague idea for how to proceed, though. If you check the work-in-progress support-custom-server-requests branch opened as part of #256, you may notice that the internal JSON-RPC implementation was completely reworked to support defining custom requests on LSP servers. I can see a possible future where the interface for registering JSON-RPC methods also requires the user to specify a concurrency mode which essentially tells the
Whether In the example that @samscott89 gave above, one would explicitly define the custom How does this approach sound to all of you? I'd appreciate it if you could try to poke holes in this design. |
Thanks for the thoughtful response @ebkalderon! In our case, we ended up scoping back the functionality to avoid needing to do that additional back-and-forth work. I don't know enough about what other LSPs are doing, but I wouldn't be surprised if what I was trying to do has a bit of a "code smell" to it. If it's not super common, the risk is that supporting these esoteric use cases is going to add a bunch of complexity, so it might not be worth it? Putting that aside though, what you described sounds like a cool way to make it work :) |
If I understand it right you're saying commands are executed exactly in order, except if they're marked "concurrent" and then they can be executed concurrently with any commands in the same contiguous group of concurrent commands? That sounds like a decent plan to me. |
Precisely! Indeed, I'm not entirely sure whether the added complexity will be worth it, but I think it's worth prototyping in a branch and mulling over. For now, in the process of refactoring the JSON-RPC router in the support-custom-server-requests branch of #256, I'm moving forward with the following set of changes from this ticket:
Any proposed solution to this ticket (which I feel might even affect portions of #177) will have to come later in a separate release. |
Seems like the approach described a few comments up might be a good way to address the |
I've been investigating different approaches to allowing However, due to the unfortunate inability to express the following: impl<R, O, F, Fut> Method<&R, (), O> for F
where
R: 'static,
F: for<'a> Fn(&'a R) -> Fut, // How do I express `Fut` has HRTB 'a here?
Fut: Future<Output = O>, the Playground example cannot accept If we abandon this approach and don't attempt to abstract over mutability, we can work around this limitation and reduce a few unnecessary heap allocations too. But this negatively impacts the API ergonomics, where users of I have managed to go beyond the approach described in the Playground link and successfully abstracted the blanket |
I've found a satisfactory way to work around the above issues with higher-kinded lifetimes (Playground). The gist of it is as follows:
This hack relies on this intermediate This took me an incredibly long time to figure out, bumping up against limitations such as rust-lang/rust#70263 frequently along the way. I'm pretty happy with the ergonomics of the end result! |
Responding here, because it probably makes more sense. I've been doing a bit of refactoring in the code for my project (from which #340 was a simplified example). Having done so, I kind of doubt now that i'm actually going to be able to meet these (or any!) trait bounds, the primary issue underlying it being self references in the thread which I currently have to spawn to respond to events. I'll definitely try and check out this HKLT, when there is a branch to look at, but the the feeling I get is that even with these trait bounds relaxed, it is likely I don't have a struct to implement satisfy them. |
With regards to the simplified example in #340, dropping the Additionally, the current push to mark certain methods that inherently depend on exclusive access to server state It's still perfectly fine to spawn background tasks to Footnotes
|
Okay, I've pushed up a quick and dirty implementation to the |
An open question regarding the implementation in the above branch: should the future returned by If anyone here depends on |
I am intending to try this out ASAP however, i've been a bit under the weather hopefully this weekend over the next few days I'll feel well enough to sit down and work on it. I'm definitely just |
Sure thing. Sorry to hear you're under the weather, @ratmice. Hope you feel better soon! Thanks for chiming in about your own usage of the |
Hey @ebkalderon, just wanted to see what the status is with making this change to It's so nice to have a clear boundary where the internal server state can be mutated safely and where it should be taken by reference. Not needing to wrap everything in a |
Hi there, @JoshuaBatty! Thanks for your interest in this issue. I think the
Hope this sheds some light on the current state of things! Ideally, I would love to be able to release the I'd love to hear thoughts from anyone (the biggest |
While not a large user (I've only communicated with one person besides myself that I know has attempted to use my lsp) , my feedback from my experiment was that it did work and appeared it would allow me to avoid making separate threads and serializing lsp messages via channels to those threads. Unfortunately in order to leverage it required the use of self-referencing structures (which I am somewhat hesitant to do). |
Good to know that the branch works as expected, @ratmice! Thanks for the input. I presume the need for self-referential structs is a specific requirement of the specific language server being built and not an issue with the branch, as I have never encountered such a need in other downstream projects relying on |
Yeah, it is a specific detail of the language server being built, the current implementation only avoids self-referential types because because it is borrowing local variables within that threads main loop, but once it has to be within something which implements a trait it needs to become a self-referential type. Edit Where on the current master I just couldn't implement the trait due to bounds requirements (IIRC). |
Thanks for the detailed response and shedding light on the complexities involved. I genuinely appreciate the effort you're putting into maintaining the stability and usability of From our standpoint, the benefits of the I understand the concerns regarding the potential API redesign and how it might introduce additional churn. While long-term stability is essential, I'd argue that we should prioritize immediate benefits that have a clear positive impact. We're willing to adapt and navigate through any churn that comes with future redesigns. |
@JoshuaBatty Thanks for the helpful reply, especially regarding whether the branch works as expected and how getting it merged would likely impact your project. I'm glad to hear folks seem generally in favor of it getting merged so far. I think I would like to hear from folks working on the Deno project, particularly from @dsherret and @bartlomieju (sorry to ping you both directly) over on denoland/deno#18079, which had spawned further discussion in #386. Would merging in the |
Not sure if it helps, but I'm trying an alternative design leveraging the interface of The current approach of But well, yes, it's more ergonomic than both the current APIs and my approach. |
I don't think it would break anything because we wouldn't use |
Sounds like we might be able to move forward with this then? How are you feeling about it @ebkalderon? |
Hey @ebkalderon sorry to be pushy, any update on your thoughts for moving this ahead? |
I'm building a language server and recently prototyped a tower-lsp rewrite. So far I really like the API. However, this issue has come up in a concrete way: It's hard to understand the order in which I need to ensure that any references to the old filename get updated before running diagnostics... when I open the "new" file, I need some way of knowing that it's actually just the old file with a new name. Since it's hard to predict which handler will be executed first it seems that the workaround will be to clean up deleted file references for both Now in this case it's probably not too big of a deal but the solution feels hacky and in the case of more complicated combinations of long running tasks I'll have to get creative to avoid redundant execution of expensive stuff. (I still need to try In general, it'd be nice to have insight into and control over the handler execution order, especially in cases where clusters of interrelated requests come in quickly, for example to explicitly debounce sequences of interdependent tasks that could be triggered by multiple events concurrently, or to throttle expensive tasks. Maybe I can use streams and stream extensions for this? The When it comes to rust async stuff I'm definitely "Alan", so it's hard for me to grasp whether the |
<!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR introduces the `ruff_server` crate and a new `ruff server` command. `ruff_server` is a re-implementation of [`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in Rust. It brings significant performance improvements, much tighter integration with Ruff, a foundation for supporting entirely new language server features, and more! This PR is an early version of `ruff_lsp` that we're calling the **pre-release** version. Anyone is more than welcome to use it and submit bug reports for any issues they encounter - we'll have some documentation on how to set it up with a few common editors, and we'll also provide a pre-release VSCode extension for those interested. This pre-release version supports: - **Diagnostics for `.py` files** - **Quick fixes** - **Full-file formatting** - **Range formatting** - **Multiple workspace folders** - **Automatic linter/formatter configuration** - taken from any `pyproject.toml` files in the workspace. Many thanks to @MichaReiser for his [proof-of-concept work](#7262), which was important groundwork for making this PR possible. ## Architectural Decisions I've made an executive choice to go with `lsp-server` as a base framework for the LSP, in favor of `tower-lsp`. There were several reasons for this: 1. I would like to avoid `async` in our implementation. LSPs are mostly computationally bound rather than I/O bound, and `async` adds a lot of complexity to the API, while also making harder to reason about execution order. This leads into the second reason, which is... 2. Any handlers that mutate state should be blocking and run in the event loop, and the state should be lock-free. This is the approach that `rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a framework), and it gives us assurances about data mutation and execution order. `tower-lsp` doesn't support this, which has caused some [issues](ebkalderon/tower-lsp#284) around data races and out-of-order handler execution. 3. In general, I think it makes sense to have tight control over scheduling and the specifics of our implementation, in exchange for a slightly higher up-front cost of writing it ourselves. We'll be able to fine-tune it to our needs and support future LSP features without depending on an upstream maintainer. ## Test Plan The pre-release of `ruff_server` will have snapshot tests for common document editing scenarios. An expanded test suite is on the roadmap for future version of `ruff_server`.
…sh#10158) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR introduces the `ruff_server` crate and a new `ruff server` command. `ruff_server` is a re-implementation of [`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in Rust. It brings significant performance improvements, much tighter integration with Ruff, a foundation for supporting entirely new language server features, and more! This PR is an early version of `ruff_lsp` that we're calling the **pre-release** version. Anyone is more than welcome to use it and submit bug reports for any issues they encounter - we'll have some documentation on how to set it up with a few common editors, and we'll also provide a pre-release VSCode extension for those interested. This pre-release version supports: - **Diagnostics for `.py` files** - **Quick fixes** - **Full-file formatting** - **Range formatting** - **Multiple workspace folders** - **Automatic linter/formatter configuration** - taken from any `pyproject.toml` files in the workspace. Many thanks to @MichaReiser for his [proof-of-concept work](astral-sh#7262), which was important groundwork for making this PR possible. ## Architectural Decisions I've made an executive choice to go with `lsp-server` as a base framework for the LSP, in favor of `tower-lsp`. There were several reasons for this: 1. I would like to avoid `async` in our implementation. LSPs are mostly computationally bound rather than I/O bound, and `async` adds a lot of complexity to the API, while also making harder to reason about execution order. This leads into the second reason, which is... 2. Any handlers that mutate state should be blocking and run in the event loop, and the state should be lock-free. This is the approach that `rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a framework), and it gives us assurances about data mutation and execution order. `tower-lsp` doesn't support this, which has caused some [issues](ebkalderon/tower-lsp#284) around data races and out-of-order handler execution. 3. In general, I think it makes sense to have tight control over scheduling and the specifics of our implementation, in exchange for a slightly higher up-front cost of writing it ourselves. We'll be able to fine-tune it to our needs and support future LSP features without depending on an upstream maintainer. ## Test Plan The pre-release of `ruff_server` will have snapshot tests for common document editing scenarios. An expanded test suite is on the roadmap for future version of `ruff_server`.
Hello, EDIT: It won't, because if some tasks are blocked, I can't guarantee which one will wake up first |
What is the status on this? Is the current recommendation just to use the Yesterday I was looking at some old code I wrote with Those are just my two cents though. More concretely: right now, what would be the recommendation for someone like myself who is just starting to write a new language server? @ebkalderon should I use |
@samestep |
@fda-odoo thanks, that makes a lot of sense! After I wrote my message above, I went and looked at how |
Our approach was to forward all messages from tower-lsp handlers (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/backend.rs) to our own event loop (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/main_loop.rs). This puts us in full control of concurrency and we can spawn threads or async tasks as we see fit (I agree with the general sentiment here that the latter is probably not that useful). We have two async tasks managing this: one semi-responsive main loop to handle requests and notifications from tower-lsp and one low latency loop for things like logging. Logging is no longer async nor blocking, it just sends a message to that tight async loop which then dispatches it to tower-lsp. We might end up switching to lsp-server later on, but for the time being we were able to solve all our issues of race conditions with tower-lsp. |
Background
In order for the framework to support both standard streams and TCP transports generically, we need a way to interleave both client-to-server and server-to-client communication over a single stream. This topic has been the subject of a previous issue. The current strategy used by
tower-lsp
(andlspower
in turn) to accomplish this is somewhat simplistic:The third step above is particularly significant, however, and has some unintended consequences.
Problem Summary
A closer reading of the "Request, Notification and Response Ordering" section of the Language Server Protocol specification reveals:
This is concerning to me because
tower-lsp
unconditionally executes pending async tasks concurrently without any regard for the correctness of the execution. The correct ordering of the outgoing messages is preserved, as per the spec, but the execution order of the handlers is not guaranteed to be correct. For example, an innocent call toself.client.log_message().await
inside one server handler might potentially prompt the executor to not immediately return back to that handler's yield point, but instead start processing the next incoming request concurrently as it becomes available.As evidenced by downstream GitHub issues like denoland/deno#10437, such behavior can potentially cause the state of the server and client to slowly drift apart as many small and subtle errors accumulate and compound over time. This problem is exacerbated by LSP's frequent use of JSON-RPC notifications rather than requests, which don't require waiting for a response from the server (see relevant discussion in microsoft/language-server-protocol#584 and microsoft/language-server-protocol#666).
It's not really possible to confidently determine whether any particular "state drift" bug was caused by the server implementation of
tower-lsp
without stepping through with a debugger. However, there are some things we can do to improve the situation for our users and make such bugs less likely.Possible solutions
For example, we could process client-to-server and server-to-client messages concurrently to each other, but individual messages of each type must execute in the order they are received, one by one, and no concurrency between individual
LanguageServer
handlers would be allowed.This would greatly decrease request throughput, however. Perhaps it would be beneficial to potentially allow some handlers to run concurrently where it's safe (with user opt-in or opt-out), but otherwise default to fully sequential execution. To quote the "Request, Notification and Response Ordering" section of the Language Server Protocol specification again:
If choose to go this route, we may have to determine ahead of time which handlers are usually safe to reorder and interleave and which are not. I imagine this would introduce a ton of additional complexity to
tower-lsp
, though, so I'd rather leave such responsibilities to the server author to implement themselves, if possible.Below are a few key takeaways that we can start adding now to improve the situation:
The matters of optional user-controlled concurrency are still unanswered, however, and will likely require further design work.
Any thoughts on the matter, @silvanshade?
The text was updated successfully, but these errors were encountered: