Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider ditching concurrent handler execution #284

Open
ebkalderon opened this issue May 23, 2021 · 42 comments
Open

Consider ditching concurrent handler execution #284

ebkalderon opened this issue May 23, 2021 · 42 comments
Assignees
Labels
bug Something isn't working

Comments

@ebkalderon
Copy link
Owner

ebkalderon commented May 23, 2021

Background

In order for the framework to support both standard streams and TCP transports generically, we need a way to interleave both client-to-server and server-to-client communication over a single stream. This topic has been the subject of a previous issue. The current strategy used by tower-lsp (and lspower in turn) to accomplish this is somewhat simplistic:

  1. Incoming messages (requests to server, responses from client) are read sequentially from the input stream
  2. Each message is routed to its respective async handler
  3. Pending tasks are buffered and executed concurrently on a single thread, maximum four (4) tasks at a time, preserving order
  4. Outgoing messages (responses from server, requests from client) are serialized into the output stream

The third step above is particularly significant, however, and has some unintended consequences.

Problem Summary

A closer reading of the "Request, Notification and Response Ordering" section of the Language Server Protocol specification reveals:

Responses to requests should be sent in roughly the same order as the requests appear on the server or client side. [...] However, the server may decide to use a parallel execution strategy and may wish to return responses in a different order than the requests were received. The server may do so as long as this reordering doesn’t affect the correctness of the responses.

This is concerning to me because tower-lsp unconditionally executes pending async tasks concurrently without any regard for the correctness of the execution. The correct ordering of the outgoing messages is preserved, as per the spec, but the execution order of the handlers is not guaranteed to be correct. For example, an innocent call to self.client.log_message().await inside one server handler might potentially prompt the executor to not immediately return back to that handler's yield point, but instead start processing the next incoming request concurrently as it becomes available.

As evidenced by downstream GitHub issues like denoland/deno#10437, such behavior can potentially cause the state of the server and client to slowly drift apart as many small and subtle errors accumulate and compound over time. This problem is exacerbated by LSP's frequent use of JSON-RPC notifications rather than requests, which don't require waiting for a response from the server (see relevant discussion in microsoft/language-server-protocol#584 and microsoft/language-server-protocol#666).

It's not really possible to confidently determine whether any particular "state drift" bug was caused by the server implementation of tower-lsp without stepping through with a debugger. However, there are some things we can do to improve the situation for our users and make such bugs less likely.

Possible solutions

For example, we could process client-to-server and server-to-client messages concurrently to each other, but individual messages of each type must execute in the order they are received, one by one, and no concurrency between individual LanguageServer handlers would be allowed.

This would greatly decrease request throughput, however. Perhaps it would be beneficial to potentially allow some handlers to run concurrently where it's safe (with user opt-in or opt-out), but otherwise default to fully sequential execution. To quote the "Request, Notification and Response Ordering" section of the Language Server Protocol specification again:

For example, reordering the result of textDocument/completion and textDocument/signatureHelp is allowed, as these each of these requests usually won’t affect the output of the other. On the other hand, the server most likely should not reorder textDocument/definition and textDocument/rename requests, since the executing the latter may affect the result of the former.

If choose to go this route, we may have to determine ahead of time which handlers are usually safe to reorder and interleave and which are not. I imagine this would introduce a ton of additional complexity to tower-lsp, though, so I'd rather leave such responsibilities to the server author to implement themselves, if possible.

Below are a few key takeaways that we can start adding now to improve the situation:

  1. Execute server request/notification handlers sequentially in the order in which they were received, no concurrency.
  2. Process client-to-server and server-to-client messages concurrently with each other.

The matters of optional user-controlled concurrency are still unanswered, however, and will likely require further design work.

Any thoughts on the matter, @silvanshade?

@ebkalderon ebkalderon added the question Further information is requested label May 23, 2021
@ebkalderon ebkalderon added bug Something isn't working and removed question Further information is requested labels May 23, 2021
@ebkalderon
Copy link
Owner Author

I have found some interesting information regarding the Language Server Protocol's consistency requirements that could help inform our future design moving forward.

First, the specification describes the correct server behavior of textDocument/didChange notifications as follows:

<snip>
To mirror the content of a document using change events use the following
approach:

  • start with the same initial content
  • apply the 'textDocument/didChange' notifications in the order you receive them.
  • apply the TextDocumentContentChangeEvents in a single notification in the order you receive them.

This is obvious and makes intuitive sense. We must process text document change notifications strictly in the order in which they are received to ensure state consistency on the server side.

We do indeed process these events in order, thanks to Stream::buffered, but there is a risk that multiple notifications may be processed concurrently (because Buffered<St> is implemented in terms of FuturesOrdered<T> which races futures to completion in parallel but yields results in the original order), and as a generic LSP framework, we cannot guarantee that all downstream LanguageServer trait implementers will behave sanely when multiple changes are processed concurrently. As such, we should consider processing all incoming JSON-RPC notifications sequentially (but still asynchronously), or at least make the concurrency level configurable.

For reference, rust-analyzer appears to always process notifications sequentially (see handle_event() & on_notification() in main_loop.rs) while request processing is largely performed in parallel, with the exception of only two standard LSP requests: shutdown and textDocument/selectionRange. This, along with the Notification Message section of the LSP specification, lends credence to the idea that all notification messages should be handled sequentially and in the order in which they are received, while requests may be freely handled concurrently.

@samscott89
Copy link

Hey! Thanks for working on this project. It's super helpful.

I wanted to add some example use cases to this. One thing we want to implement is a custom request that can be used to interrogate the LSP client. For example, suppose we want to ask the client to fetch all workspace symbols across all language. Then we end up with the following flow:

Example protocol flow. Source.

Relevant to this discussion, you can see that within handling the initial request (id=0), a few things happen:

  1. The server makes a custom request back to the client (id=1). Processing this request will actually block handling the initial request (id=0). So request 0 cannot complete until id=1 does.
  2. The client asks ALL language servers for workspace symbols. This includes our server running tower-lsp (that's id=2). So now request 0 depends on request 1 which depends on request 2. The client blocks until id=2 completes on the server (although in vscode for example, this will timeout after ~1s).

So basically, there's a point in time where there are three pending futures on the server:

id=0: the original request from the client
id=1: the server's custom request to the client
id=2: the new request from the client

And these need to be handled in reverse order.

I have this implemented, and it was blocking until I made this change. Which is probably what you would expect to happen:

diff --git a/src/transport.rs b/src/transport.rs
index 0b4a647..69c4802 100644
--- a/src/transport.rs
+++ b/src/transport.rs
@@ -95,11 +95,12 @@ where

         let mut framed_stdin = FramedRead::new(self.stdin, LanguageServerCodec::default());
         let framed_stdout = FramedWrite::new(self.stdout, LanguageServerCodec::default());
-        let responses = receiver.buffered(4).filter_map(future::ready);
+        let responses = receiver.buffer_unordered(4).filter_map(future::ready);
         let interleave = self.interleave.fuse();

Does this make sense? Or am I using custom request wrong? Because without the above change, I'm struggling to see how sending a server->client request could ever make progress?

@davidhalter
Copy link

I agree with @samscott89 that having concurrency should probably be possible. However it would probably be nice for some language servers to have no concurrency at all, because that might make the whole thing a bit simpler.

Most language servers will not be able to do a lot concurrently anyway. Calculating diagnostics (which is typically the most expensive operation) can usually not be done at the same time as calculating completions, because they access the same data structures. Although, most of the time caches can be used to avoid recalculating everything.

So IMO unless you have a more fancy architecture like @samscott89 you probably want no concurrency at all and this should probably be the default. However I don't see why users should not also be able to work with concurrency.

@ebkalderon
Copy link
Owner Author

ebkalderon commented Sep 3, 2021

Thanks for the suggestions and helpful discussion! However, there is unfortunately trouble associated with the use of buffer_unordered(), as described in commit 456856b which makes it undesirable. The issue is that while client-to-server and server-to-client messages should indeed be processed concurrently and in any order (as the transport is a full duplex channel), the individual client-to-server requests and server-to-client requests must return their responses in the order in which they were received, regardless of whether concurrency is used or not, as per specification. The current design interleaves server-to-client and and client-to-server communication over a single async channel, which means that neither buffered() nor buffer_unordered() will truly yield the correct behavior in this instance.

One possible solution that could solve the problem while preserving the specification invariants might be, as described in the issue description, to have two separate channels for server-to-client and client-to-server communication. Each of these channels would use buffered() by a certain amount to preserve the output ordering of client requests and server requests, respectively. However, these two channels would then be selected upon and process their respective messages without ordering to ensure that server-to-client and client-to-server communication can be buffered independently without either one blocking the other.

This still doesn't solve the data integrity issues caused by allowing servers to have concurrent handlers (which is the original subject of this issue), but it may at least fix the blocking problems @samscott89 described caused by client-to-server and server-to-client requests accidentally blocking on each other, I think.

I do like @davidhalter's suggestion of making the buffered concurrency configurable by the user and possibly disabled by default, though. Perhaps that may be the best route forward: come out of the box with concurrent processing of client-to-server and server-to-client messages, but each direction processing handlers sequentially in the order in which they were received.

@davidhalter
Copy link

the individual client-to-server requests and server-to-client requests must return their responses in the order in which they were received, regardless of whether concurrency is used or not, as per specification.

This is not how I understand the spec. The spec says:

However, the server may decide to use a parallel execution strategy and may wish to return responses in a different order than the requests were received. The server may do so as long as this reordering doesn’t affect the correctness of the responses.

I understand that this means that reording/rename requests may not be reordered, but I would say that even there there are exceptions: For example a local definition lookup could in theory be reordered with a rename in a different part of the code. That is still correct behavior according to the specification.

What do you think?

@samscott89
Copy link

samscott89 commented Sep 6, 2021

I agree with @davidhalter's reading of things. But even so

The server may do so as long as this reordering doesn’t affect the correctness of the responses.

sounds like a brutal invariant to attempt to support generically! So even if technically it should be allowed, I could see a good argument for not supporting it in a higher-level framework like tower-lsp. For my specific use case, it would be more than reasonable to not support handling client requests out of order. But it is still a problem that a server cannot request information from the client during handling a client request. In the above diagram, the original id=0 request will block while waiting on id=1, but id=1 is behind id=0 in the buffer.

As far as I can tell, your proposed solution would achieve that:

However, these two channels would then be selected upon and process their respective messages without ordering to ensure that server-to-client and client-to-server communication can be buffered independently without either one blocking the other.

with that in place, you would see:

  • the id=2 request from the client cannot complete while the server is still processing id=0. <-- IMO this is a reasonable restriction. Otherwise you need to support infinitely nested client-server interactions.
  • but the id=1 request from the server to the client would complete before id=0 is finished, which would allow id=0 to continue and compute the response.

@Timmmm
Copy link

Timmmm commented Feb 7, 2022

Sounds like it should be non-concurrent by default with opt-in to concurrency? I can't imagine there are many extensions that would really benefit from concurrency anyway.

@ebkalderon
Copy link
Owner Author

That's certainly a possibility, @Timmmm. Though it would be a shame to disallow scenarios such as @samscott89 described in this server. I do feel like this use case should be better supported by tower-lsp without too much fiddling, somehow.

I think I've devised a vague idea for how to proceed, though. If you check the work-in-progress support-custom-server-requests branch opened as part of #256, you may notice that the internal JSON-RPC implementation was completely reworked to support defining custom requests on LSP servers.

I can see a possible future where the interface for registering JSON-RPC methods also requires the user to specify a concurrency mode which essentially tells the Server (or LspService; haven't decided yet) whether to execute the incoming client-to-server request sequentially or concurrently. With this design:

  1. Incoming client-to-server request handlers are marked either "sequential" or "concurrent."
  2. "Sequential" request handlers are executed sequentially and must complete before the next incoming request can be processed, while one or more "concurrent" can be buffered together and executed concurrently to each other without any defined ordering.
    • TODO: Should we opt to use buffered() or buffered_unordered() for this? Tradeoff between delayed/ordered vs. rapid/unordered returning of responses.
  3. Responses from the "sequential" request stream and the "concurrent" request stream are serialized into the output stream using future::stream::select() as they become available (along with any server-to-client request messages as well).
  4. Incoming responses from server-to-client requests are always processed immediately as they come in, concurrently with the incoming client-to-server requests.

Whether tower-lsp should ship by default with all standard LSP methods configured to "sequential" or some sensible mixture of "sequential" and "concurrent" based on a conservative reading of the original spec is up for debate. But no matter the approach taken, these settings should probably be customizable by the end user when constructing the Server or LspService (depending on, again, where this request buffering functionality should ideally live).

In the example that @samscott89 gave above, one would explicitly define the custom id=0 request or notification on "Server" to be of the "concurrent" type, which would allow that async task to stick around and wait while the server processes the workspace/symbols id=3 request alongside it and the response to id=1 is received. This is precisely the behavior that we want.

How does this approach sound to all of you? I'd appreciate it if you could try to poke holes in this design.

@samscott89
Copy link

Thanks for the thoughtful response @ebkalderon!

In our case, we ended up scoping back the functionality to avoid needing to do that additional back-and-forth work. I don't know enough about what other LSPs are doing, but I wouldn't be surprised if what I was trying to do has a bit of a "code smell" to it.

If it's not super common, the risk is that supporting these esoteric use cases is going to add a bunch of complexity, so it might not be worth it?

Putting that aside though, what you described sounds like a cool way to make it work :)

@Timmmm
Copy link

Timmmm commented Mar 2, 2022

If I understand it right you're saying commands are executed exactly in order, except if they're marked "concurrent" and then they can be executed concurrently with any commands in the same contiguous group of concurrent commands?

That sounds like a decent plan to me.

@ebkalderon
Copy link
Owner Author

Precisely! Indeed, I'm not entirely sure whether the added complexity will be worth it, but I think it's worth prototyping in a branch and mulling over.

For now, in the process of refactoring the JSON-RPC router in the support-custom-server-requests branch of #256, I'm moving forward with the following set of changes from this ticket:

  1. Incoming server requests are buffered concurrently (as they already are today, but with a customizable concurrency level).
  2. Received responses to client requests are processed immediately, rather than buffering them concurrently, to avoid these idle tasks clogging up the fixed size buffer unnecessarily.

Any proposed solution to this ticket (which I feel might even affect portions of #177) will have to come later in a separate release.

@ebkalderon
Copy link
Owner Author

Seems like the approach described a few comments up might be a good way to address the &mut methods suggestion mentioned in this Reddit comment by u/matklad.

@ebkalderon
Copy link
Owner Author

I've been investigating different approaches to allowing &mut self methods in an ergonomic way, as suggested in the Reddit link above. Here is a Router<S> prototype which accepts method handlers with either &self or &mut self receivers (Playground).

However, due to the unfortunate inability to express the following:

impl<R, O, F, Fut> Method<&R, (), O> for F
where
    R: 'static,
    F: for<'a> Fn(&'a R) -> Fut, // How do I express `Fut` has HRTB 'a here?
    Fut: Future<Output = O>,

the Playground example cannot accept async fn methods at present (see comment in Playground code).

If we abandon this approach and don't attempt to abstract over mutability, we can work around this limitation and reduce a few unnecessary heap allocations too. But this negatively impacts the API ergonomics, where users of LspServiceBuilder are forced to register &self method handlers with .custom_method() and &mut self handlers with .custom_method_mut(), respectively.

I have managed to go beyond the approach described in the Playground link and successfully abstracted the blanket Method trait impls over mutability, but it's pretty complex and I'm looking to simplify the traits further, if possible.

@ebkalderon
Copy link
Owner Author

ebkalderon commented Jan 17, 2023

I've found a satisfactory way to work around the above issues with higher-kinded lifetimes (Playground). The gist of it is as follows:

  1. Create a private trait named Closure<Recv, Args, Out> with an invoke() method.
  2. Implement this trait over all async closures F: Fn(Recv, Args) -> Fut where Fut: Future<Output = Out>. This implementation simply calls itself and forwards any arguments: self(recv, args).
  3. Create a public trait named Method<Recv, Args, Out> with a similarly named call() method.
  4. Implement Method<&'a Recv, Args, Out> for all types satisfying for<'b> Closure<&'b Recv, Args, Out> + 'a.
  5. Implement Method<&'a mut Recv, Args, Out> for all types satisfying for<'b> Closure<&'b mut Recv, Args, Out> + 'a.
  6. You should now be able to pass async methods taking either &self or &mut self receivers as arguments to functions expecting F: Method<Recv, Args, Out>.

This hack relies on this intermediate Closure trait to express the higher-kinded lifetime bound previously considered inexpressible.

This took me an incredibly long time to figure out, bumping up against limitations such as rust-lang/rust#70263 frequently along the way. I'm pretty happy with the ergonomics of the end result!

@ratmice
Copy link

ratmice commented Jan 30, 2023

FYI @ratmice, I hope to relax the Send bound requirements on LanguageServer futures eventually as part of #284 (comment), thereby supporting thread-local futures out of the box. The Client type will remain Send, of course, in case anyone wants to pass it into a Send task.

Responding here, because it probably makes more sense.

I've been doing a bit of refactoring in the code for my project (from which #340 was a simplified example). Having done so, I kind of doubt now that i'm actually going to be able to meet these (or any!) trait bounds, the primary issue underlying it being self references in the thread which I currently have to spawn to respond to events.

I'll definitely try and check out this HKLT, when there is a branch to look at, but the the feeling I get is that even with these trait bounds relaxed, it is likely I don't have a struct to implement satisfy them.

@ebkalderon
Copy link
Owner Author

ebkalderon commented Jan 31, 2023

With regards to the simplified example in #340, dropping the Send bound on all futures indeed allows !Send struct fields, and these values may also be borrowed across .await points in LanguageServer method handlers (Playground link)1. This enables servers to call out to Client at any time while holding references to !Send data.

Additionally, the current push to mark certain methods that inherently depend on exclusive access to server state &mut self (e.g. initialize, shutdown, and the did_.* notifications) should make the API easier to reason about as well.2

It's still perfectly fine to spawn background tasks to tokio::spawn() or std::thread::spawn() inside any of these handlers, of course. Though if you do, you will need to make sure the data passed in and out of the task it is Send + 'static like always.

Footnotes

  1. In reality, tower_service::Service requires that all futures be at least 'static, unfortunately (see this comment on tower-rs/tower). The language server instance could be wrapped in an Rc/Arc internally to satisfy this requirement, though, which is exactly what tower-lsp does today.

  2. Note that the use of tokio::sync::RwLock in the playground example to support &mut self methods is just for demonstration purposes. The real implementation I'm working on uses a fast async-aware semaphore instead of locks.

@ebkalderon
Copy link
Owner Author

Okay, I've pushed up a quick and dirty implementation to the support-mutable-methods branch. Haven't stress-tested it much yet, but all tests pass. Would appreciate any feedback from folks!

@ebkalderon
Copy link
Owner Author

ebkalderon commented Mar 2, 2023

An open question regarding the implementation in the above branch: should the future returned by Server::serve() be Send? The current work returns a thread-local !Send + !Sync future, since we assume most users will not attempt to spawn() their server, but will instead be .await-ing it on the main thread. This lets us use a lightweight async semaphore + RefCell to synchronize access to the inner LanguageServer instead of an async RwLock (which may or may not be tied to a specific async runtime, something we don't want).

If anyone here depends on Server::serve() being Send, please speak up now!

@ratmice
Copy link

ratmice commented Mar 2, 2023

I am intending to try this out ASAP however, i've been a bit under the weather hopefully this weekend over the next few days I'll feel well enough to sit down and work on it. I'm definitely just .await-ing in the main thread though.

@ebkalderon
Copy link
Owner Author

Sure thing. Sorry to hear you're under the weather, @ratmice. Hope you feel better soon!

Thanks for chiming in about your own usage of the Server::serve() future. Would love to hear other folks' feedback too. I may let this thread simmer for a bit longer and wait for other responses to trickle in.

@ebkalderon ebkalderon self-assigned this Mar 3, 2023
@JoshuaBatty
Copy link

Hey @ebkalderon, just wanted to see what the status is with making this change to tower-lsp. I just modified our server to target the support-mutable-methods branch to see how the new changes feel.

It's so nice to have a clear boundary where the internal server state can be mutated safely and where it should be taken by reference. Not needing to wrap everything in a RwLock or similar really is a massive ergonomic win. Would love to see this change land in an official release. How are you feeling about it? Is there something holding you back? Is there anything you need help with? Keen to lend a hand if it helps move this along.

@ebkalderon
Copy link
Owner Author

Hi there, @JoshuaBatty! Thanks for your interest in this issue. I think the support-mutable-methods branch is largely ready to go in its current state. However, there are a few complications currently blocking its immediate release.

  1. There doesn't appear to be a consensus among downstream projects whether this change is desirable. Considering tower-lsp is being used by several large projects at this time, I would prefer not to break API stability too egregiously (though I'm not afraid to do so, if I must). I would love to collect some feedback from at least the top-largest key players before merging this branch in, but I'm not sure how to best go about this yet.
  2. Some folks have expressed a desire for concurrent execution of request/notification handlers to remain configurable, if not the default (e.g. see comments by @dsherret on Concurrent requests without Send requirement for single threaded runtimes #386). Being able to select between &self and &mut self method receivers in the current trait-based LanguageServer API is unfortunately not practical, so this would require a deeper API redesign which I suspect would, again, disrupt downstream users quite a bit.
  3. On the same topic, it's possible that we might need to migrate away from the monolithic LanguageServer trait API regardless in order to resolve #386 (making &self and &mut self concurrency user-configurable) and #393 (allow overriding default method signatures). I have promising ideas on this front, but pursuing this would make the support-mutable-methods branch obsolete if I went in this direction.

Hope this sheds some light on the current state of things! Ideally, I would love to be able to release the support-mutable-methods branch ASAP, to improve ergonomics for users in the short-term, and then release a more flexible redesigned API later. I worry about how downstream projects would react to this increased level of churn, though.

I'd love to hear thoughts from anyone (the biggest tower-lsp users out there especially) on what would be the best path forward!

@ratmice
Copy link

ratmice commented Sep 3, 2023

  1. I would love to collect some feedback

While not a large user (I've only communicated with one person besides myself that I know has attempted to use my lsp) , my feedback from my experiment was that it did work and appeared it would allow me to avoid making separate threads and serializing lsp messages via channels to those threads.

Unfortunately in order to leverage it required the use of self-referencing structures (which I am somewhat hesitant to do).
If it wasn't for that, it should be a no-brainer for my project. But at the time it seemed like trading the mess I have (which works ok for now) for some ownership messiness of a new kind.
So, I wasn't quite sold yet on the full rewrite that would allow it to leverage the new things which would only be possible with the
mutable-methods branch. I haven't done an update to my current code base to the mutable-methods branch (not really leveraging the new possibilities), but would be happy to do that if it would be helpful.

https://github.com/ratmice/nimbleparse_lsp_mutable_methods/blob/30ee8436ba28bb174f8deaaa53becb833035c896/server/src/main.rs#L130C1-L154C1

@ebkalderon
Copy link
Owner Author

Good to know that the branch works as expected, @ratmice! Thanks for the input. I presume the need for self-referential structs is a specific requirement of the specific language server being built and not an issue with the branch, as I have never encountered such a need in other downstream projects relying on tower-lsp AFAIK, but please correct me if I'm wrong there.

@ratmice
Copy link

ratmice commented Sep 4, 2023

Yeah, it is a specific detail of the language server being built, the current implementation only avoids self-referential types because because it is borrowing local variables within that threads main loop, but once it has to be within something which implements a trait it needs to become a self-referential type.

Edit Where on the current master I just couldn't implement the trait due to bounds requirements (IIRC).

@JoshuaBatty
Copy link

Thanks for the detailed response and shedding light on the complexities involved. I genuinely appreciate the effort you're putting into maintaining the stability and usability of tower-lsp.

From our standpoint, the benefits of the support-mutable-methods branch are immediately tangible. The ergonomics it offers are massive for us, and having that in an official release would significantly enhance our development experience.

I understand the concerns regarding the potential API redesign and how it might introduce additional churn. While long-term stability is essential, I'd argue that we should prioritize immediate benefits that have a clear positive impact. We're willing to adapt and navigate through any churn that comes with future redesigns.

@ebkalderon
Copy link
Owner Author

@JoshuaBatty Thanks for the helpful reply, especially regarding whether the branch works as expected and how getting it merged would likely impact your project. I'm glad to hear folks seem generally in favor of it getting merged so far.

I think I would like to hear from folks working on the Deno project, particularly from @dsherret and @bartlomieju (sorry to ping you both directly) over on denoland/deno#18079, which had spawned further discussion in #386. Would merging in the support-mutable-methods branch break anything over at Deno today?

@oxalica
Copy link

oxalica commented Sep 6, 2023

Not sure if it helps, but I'm trying an alternative design leveraging the interface of tower::Service for all methods in my project async-lsp. That are request handlers fn(&mut State, Params) -> impl Future + 'static and notification handlers fn(&mut State). This approach has the downside of incompatibility with async fn sugar, forcing mutations being done before computations, but simplifies the concurrency implementation. We can just call the handler, spawn the returned future and repeat, then everything is correctly ordered and asynchronous computations are fully concurrent.

The current approach of support-mutable-methods uses async fn(&mut State, Params) aka. fn<'a>(&'a mut State, Params) -> impl Future + 'a , which holds the write lock to State during the task duration. It means the frequent did_change handler becomes the join point and concurrency can only happen between two keystrokes. This may already be the case for some language servers, but when State is cheap to snapshot/clone, concurrent read and write are possible and can be more efficient.

But well, yes, it's more ergonomic than both the current APIs and my approach.

@dsherret
Copy link

dsherret commented Sep 7, 2023

I think I would like to hear from folks working on the Deno project, particularly from @dsherret and @bartlomieju (sorry to ping you both directly) over on denoland/deno#18079, which had spawned further discussion in #386. Would merging in the support-mutable-methods branch break anything over at Deno today?

I don't think it would break anything because we wouldn't use (?Send) on the trait so the interface should remain the same, correct? I think we'll just continue to use the current interface and do some hacks to make it work better with a single threaded runtime. I'd rather we control the concurrency and mutability for the reasons outlined in the other issue.

@JoshuaBatty
Copy link

Sounds like we might be able to move forward with this then? How are you feeling about it @ebkalderon?

@JoshuaBatty
Copy link

Hey @ebkalderon sorry to be pushy, any update on your thoughts for moving this ahead?

@micahscopes
Copy link

micahscopes commented Jan 29, 2024

I'm building a language server and recently prototyped a tower-lsp rewrite. So far I really like the API.

However, this issue has come up in a concrete way: It's hard to understand the order in which did_close, did_open and did_change_watched_file are executed after a file rename in vscode. In the lsp-server implementation did_close was handled first, whereas in the tower-lsp implementation prototype did_open is handled first (seemingly).

I need to ensure that any references to the old filename get updated before running diagnostics... when I open the "new" file, I need some way of knowing that it's actually just the old file with a new name. Since it's hard to predict which handler will be executed first it seems that the workaround will be to clean up deleted file references for both did_open and did_change_watched_file.

Now in this case it's probably not too big of a deal but the solution feels hacky and in the case of more complicated combinations of long running tasks I'll have to get creative to avoid redundant execution of expensive stuff.

(I still need to try will_rename, hopefully that will fix everything in this specific case since the client literally waits for a response to that before executing the rename...)

In general, it'd be nice to have insight into and control over the handler execution order, especially in cases where clusters of interrelated requests come in quickly, for example to explicitly debounce sequences of interdependent tasks that could be triggered by multiple events concurrently, or to throttle expensive tasks. Maybe I can use streams and stream extensions for this? The LanguageServer trait implementation handlers can directly handle certain things if there isn't a potential for interference, but in other cases maybe they could defer to some more complex logic by sending events to channels that get processed as streams. In that case, what would be the advantage of using tower-lsp over lsp-server? The easy to use API is definitely one advantage. Maybe there are others?

When it comes to rust async stuff I'm definitely "Alan", so it's hard for me to grasp whether the support-mutable-methods changes can resolve my issues.

snowsignal added a commit to astral-sh/ruff that referenced this issue Mar 9, 2024
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR introduces the `ruff_server` crate and a new `ruff server`
command. `ruff_server` is a re-implementation of
[`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in
Rust. It brings significant performance improvements, much tighter
integration with Ruff, a foundation for supporting entirely new language
server features, and more!

This PR is an early version of `ruff_lsp` that we're calling the
**pre-release** version. Anyone is more than welcome to use it and
submit bug reports for any issues they encounter - we'll have some
documentation on how to set it up with a few common editors, and we'll
also provide a pre-release VSCode extension for those interested.

This pre-release version supports:
- **Diagnostics for `.py` files**
- **Quick fixes**
- **Full-file formatting**
- **Range formatting**
- **Multiple workspace folders**
- **Automatic linter/formatter configuration** - taken from any
`pyproject.toml` files in the workspace.

Many thanks to @MichaReiser for his [proof-of-concept
work](#7262), which was important
groundwork for making this PR possible.

## Architectural Decisions

I've made an executive choice to go with `lsp-server` as a base
framework for the LSP, in favor of `tower-lsp`. There were several
reasons for this:

1. I would like to avoid `async` in our implementation. LSPs are mostly
computationally bound rather than I/O bound, and `async` adds a lot of
complexity to the API, while also making harder to reason about
execution order. This leads into the second reason, which is...
2. Any handlers that mutate state should be blocking and run in the
event loop, and the state should be lock-free. This is the approach that
`rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a
framework), and it gives us assurances about data mutation and execution
order. `tower-lsp` doesn't support this, which has caused some
[issues](ebkalderon/tower-lsp#284) around data
races and out-of-order handler execution.
3. In general, I think it makes sense to have tight control over
scheduling and the specifics of our implementation, in exchange for a
slightly higher up-front cost of writing it ourselves. We'll be able to
fine-tune it to our needs and support future LSP features without
depending on an upstream maintainer.

## Test Plan

The pre-release of `ruff_server` will have snapshot tests for common
document editing scenarios. An expanded test suite is on the roadmap for
future version of `ruff_server`.
nkxxll pushed a commit to nkxxll/ruff that referenced this issue Mar 10, 2024
…sh#10158)

<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR introduces the `ruff_server` crate and a new `ruff server`
command. `ruff_server` is a re-implementation of
[`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in
Rust. It brings significant performance improvements, much tighter
integration with Ruff, a foundation for supporting entirely new language
server features, and more!

This PR is an early version of `ruff_lsp` that we're calling the
**pre-release** version. Anyone is more than welcome to use it and
submit bug reports for any issues they encounter - we'll have some
documentation on how to set it up with a few common editors, and we'll
also provide a pre-release VSCode extension for those interested.

This pre-release version supports:
- **Diagnostics for `.py` files**
- **Quick fixes**
- **Full-file formatting**
- **Range formatting**
- **Multiple workspace folders**
- **Automatic linter/formatter configuration** - taken from any
`pyproject.toml` files in the workspace.

Many thanks to @MichaReiser for his [proof-of-concept
work](astral-sh#7262), which was important
groundwork for making this PR possible.

## Architectural Decisions

I've made an executive choice to go with `lsp-server` as a base
framework for the LSP, in favor of `tower-lsp`. There were several
reasons for this:

1. I would like to avoid `async` in our implementation. LSPs are mostly
computationally bound rather than I/O bound, and `async` adds a lot of
complexity to the API, while also making harder to reason about
execution order. This leads into the second reason, which is...
2. Any handlers that mutate state should be blocking and run in the
event loop, and the state should be lock-free. This is the approach that
`rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a
framework), and it gives us assurances about data mutation and execution
order. `tower-lsp` doesn't support this, which has caused some
[issues](ebkalderon/tower-lsp#284) around data
races and out-of-order handler execution.
3. In general, I think it makes sense to have tight control over
scheduling and the specifics of our implementation, in exchange for a
slightly higher up-front cost of writing it ourselves. We'll be able to
fine-tune it to our needs and support future LSP features without
depending on an upstream maintainer.

## Test Plan

The pre-release of `ruff_server` will have snapshot tests for common
document editing scenarios. An expanded test suite is on the roadmap for
future version of `ruff_server`.
@fda-odoo
Copy link

fda-odoo commented Jun 19, 2024

Hello,
I would like to make the point on this issue. We are using TowerLSP for our project, and we are wondering about the message ordering.
As I understand, the project is actually synchrone as far as you don't release the task with an await or similar call?
This is a real issue for us, as some methods have to call customs routes and wait for the response of the client, making the server unstable. As the specs indicates, Completion or signatureHelp can be reordered, but reordering did_change notification is totally forbidden, especially if we use incremental file cache and not the TextDocumentSyncKind::FULL option.
Actually my plan is to list all method that can be reordered and ones that can't, and check before any execution that no other 'not-reorderable' function is executing.
Do you think it could temporary solve the issue with the actual version of TowerLSP?

EDIT: It won't, because if some tasks are blocked, I can't guarantee which one will wake up first

@samestep
Copy link

What is the status on this? Is the current recommendation just to use the support-mutable-methods branch to be able to have confidence that document change notifications are handled in the correct order?

Yesterday I was looking at some old code I wrote with lspower in 2021 trying to see why I remembered it as being so painful, and I saw that I had written it to just use one big sequential channel that all incoming LSP messages get sent through. Naively, it seems to me that having a completely sequential API with &mut self everywhere would be the nicest, because then new users like myself would be able to have reasonable confidence that they're implementing something correct, and if someone wants more throughput, they can implement that concurrency themselves on top of the basic sequential API.

Those are just my two cents though. More concretely: right now, what would be the recommendation for someone like myself who is just starting to write a new language server? @ebkalderon should I use tower-lsp with the support-mutable-methods branch? Should I instead use lsp-server as it looks like @mkaput is doing? From the earlier message by @micahscopes it sounds like lsp-server might handle document open/change/close messages in a weird order, but maybe I'm just misunderstanding that specific example. Or should I just write my own server so at least I can know for a fact that it's handling messages sequentially? (I really would rather avoid that if possible.)

@fda-odoo
Copy link

@samestep
I don't know the status of this project as I asked the same question 2 months ago, without answer.
We finally chose to implement our own message manager, because we want to be sure to have something stable and maintainable.
And what a wonderful plan ! It took me 2 days to change our code from tower-lsp, but we have full control on the protocol, and we dropped all the tower-lsp and tokio dependencies, and we are no more using async keyword, all that for 500 new lines of code.
I used lsp-types of course, as well as lsp-server from rust-analyzer that provided me the base to open io channels (tcp, stdio,...)
I now have a super simple structure where I can simply handle a new type of message with a simple line.
So personally I would suggest you to take a day (we had to migrate all our code, so it's why it took 2 days) to implement it by yourself, by that's my point of view :)
The code for information (in alpha, the lsp part is not separated from the project however): https://github.com/odoo/odoo-ls/blob/master/server/src/threads.rs#L236
If you are not confident with threads, however, I would suggest to wait for an answer here or try to find another library.

@samestep
Copy link

samestep commented Aug 21, 2024

@fda-odoo thanks, that makes a lot of sense! After I wrote my message above, I went and looked at how lsp-server is used in rust-analyzer, and it indeed looks like exactly what I want.

@lionel-
Copy link

lionel- commented Aug 21, 2024

Our approach was to forward all messages from tower-lsp handlers (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/backend.rs) to our own event loop (https://github.com/posit-dev/ark/blob/main/crates/ark/src/lsp/main_loop.rs). This puts us in full control of concurrency and we can spawn threads or async tasks as we see fit (I agree with the general sentiment here that the latter is probably not that useful).

We have two async tasks managing this: one semi-responsive main loop to handle requests and notifications from tower-lsp and one low latency loop for things like logging. Logging is no longer async nor blocking, it just sends a message to that tight async loop which then dispatches it to tower-lsp.

We might end up switching to lsp-server later on, but for the time being we were able to solve all our issues of race conditions with tower-lsp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests