POC: Rust LSP #7262

MichaReiser · 2023-09-10T18:08:47Z

Rust LSP

This PR adds a Rust-based LSP to ruff. You can start it by running ruff lsp. It receives messages over stdin and writes responses over stdout. stderr is used for logging.

Getting started

Create a debug build of ruff.
Open the VS Code extension in VS Code (editors/vscode)
Run nom install in editors/vscode
Install the recommended extensions
Launch the extension (F5) or Run and Debug / Run Extension
Open the User Settings and configure the path to the ruff binary: "ruff.lspBin": "<path>/target/debug/ruff"
Open a Python file

Advantages

Keeps all open files in memory to enable multi-file analysis (access to unsaved changes) eventually
Avoids the overhead of starting a new process, discovering the configuration, etc., on every keystroke by using a long-running server instead
One less dependency (ruff-lsp)
Removes the need to expose information in our JSON output or CLI only for the VS code extension.
Removes the Python dependency from the VS Code extension (including the extension depending on the Python extension), removing concerns about the minimal supported Python version.

Development tools

You can debug the VS code extension inside VS Code or by opening the developer tools: Cmd+Shift+P: Developer: Toggle Developer Tools in the VS Code instance running the extension.
You can set the ruff.trace.server setting to verbose to get trace level tracing in the Output: Ruff panel and see all messages between client and server in the new Output: Ruff Trace panel.

Implemented functionality

UTF-8, UTF-16, and UTF-32 position encoding
Incremental document updates. The LSP stores all open documents in memory
Multi-workspace support
Linting with code fixes
Formatting: Only transmits changed lines instead of the whole file
Configuration caching: Cache the workspace configurations. Reloads them on change.

Challenges and open questions

Tower-LSP or LSP-server

The most known crates for building an LSP server with rust are tower-lsp and lsp-server. This prototype uses tower-lsp, but I think it's worth considering lsp-server.

tower-lsp

A framework in the spirit of You don't call us, we call you. It controls the main loop and the messaging for you.
Based on async. Uses tokio by default
Provides a trait with methods for the most common operations. Giving you a sense of a typed API ;)
Handles some invariants for you: Disallowing multiple initialize messages. Not accepting messages after shutdown.
Long standing issue about concurrent handler execution (race conditions)

lsp-server

Used by rust-analyzer
A library in the sense that it doesn't control the main loop. It only handles the message dispatching between client and server but leaves the scheduling to the user.
Being in control of the main loop allows more advanced scheduling. E.g. rust-analzyer does:
- Uses the main thread to mutate the global plugin state.
- Uses a high-priority thread for on-type requests (format on type)
- Uses a dedicated formatter thread
- Use higher priority threads for latency-sensitive analysis
- Use lower-priority threads for everything else.
Doesn't use async Rust, which is less useful for code analysis because most tasks are CPU and not IO bound.
Hypothesis: Controlling the scheduling may help to avoid thread-safe structures if it is known that some data is only ever accessed from the main thread.

Example

VS Code Extension migration

This prototype doesn't explore how to support the old Python-based LSP and the ruff-integrated LSP in a single extension. It's further unclear how the new LSP would support the args setting (The setting is already problematic today because format and check don't support the same arguments).

A possible approach could be that the extension detects the version of the ruff binary and either spawns ruff-lsp (Python LSP) or ruff lsp (Rust based LSP)

Ruff binary discovery

This prototype doesn't explore how to discover the ruff binary for the current project (and using the bundled version for untrusted workspaces). It defaults to ruff.lspBin.

Higher level `lint` and `format` abstractions

The ruff_workspace crate provides most functionality needed by the LSP, but not with the right granularity and laziness. This either results in code duplication between the CLI and the LSP or unnecessary work:

The LSP resolves the settings for each open workspace by using PyprojectConfig::resolve and python_files_in_path.
- python_files_in_path resolves all settings and python files included in the project. However, the LSP only needs the settings. Building up a list of all files in the project is wasted work.
- Ideally, the Setting for a path would be resolved lazily rather than eagerly.
The LSP should only lint and format files that are part of the project (included and not excluded). There's no existing API to check whether a file is part of the Ruff project (the CLI uses the files returned by python_files_in_path)
The detection of a file's source type would need to be replicated in the LSP.

To sum it up. We lack high-level APIs for lazy linting or formatting of a file to avoid duplicating concerns performed by the CLI.

Concurrency

Note: I'm new to writing concurrent Rust code.

The fact that tower-lsp uses tokio forces the implementation to use thread-safe (Send and Sync) data structure for the session state. This makes accessing and mutating the configuration awkward and introduces much complexity. It would be nice if we could minimize the need for thread-safe data structures and locking.

However, this problem isn't only caused by tower-lsp. I see us facing similar issues when implementing multi-file analysis where we suddenly need a way to snapshot the project state and run the analysis on the snapshot (to avoid new user modifications causing inconsistent states).

Notes

It isn't necessary to integrate the VS code extension into this repository. I added it to this repository to have a single PR that includes all changes (from plugin to ruff)

MichaReiser · 2023-09-12T05:50:41Z

Current dependencies on/for this PR:

main
- PR POC: Rust LSP #7262 👈

This comment was auto-generated by Graphite.

codspeed-hq · 2023-09-25T13:06:09Z

CodSpeed Performance Report

Merging #7262 will improve performances by 2.9%

_{Comparing rust-based-lsp (a670128) with main (8028de8)}

Summary

⚡ 1 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`rust-based-lsp`	Change
⚡	`linter/default-rules[numpy/ctypeslib.py]`	18.6 ms	18.1 ms	+2.9%

Signed-off-by: Micha Reiser <micha@reiser.io>

micahscopes · 2024-01-29T14:59:08Z

This is a very nice writeup. What did you end up deciding on?

MichaReiser · 2024-01-29T15:05:05Z

This is a very nice writeup. What did you end up deciding on?

Thanks. We have yet to decide. We hope that @inquisitivepenguin can work and this soon.

Are you working on something similar?

micahscopes · 2024-01-29T18:56:10Z

@MichaReiser Yes, I'm working on a language server for Fe language. I recently prototyped a switch from lsp-server to tower-lsp and I've documented some of my thoughts so far here: ethereum/fe#979

Long story short, it seems that there's no way of making concurrency problems magically simple. Waving async/await around at them isn't going to make them go away. I'm tempted to try stream extensions for this because of my existing familiarity with javascript "FRP" libraries and for their flexibility and expressiveness. In that case though the choice between tower-lsp and lsp-server probably doesn't matter too much. Tower-lsp has that nice simple API for implementing LSP features and is simple to set up.

We are using salsa2022 in our compiler libraries, which alleviates some of the pressure of getting this stuff perfect. A lot of computations will be reused as long as inputs haven't changed.

MichaReiser · 2024-01-30T06:57:55Z

@micahscopes Thanks for sharing your findings and the pull request. It will help us evaluating tower-lsp and lsp-server.

Long story short, it seems that there's no way of making concurrency problems magically simple.

I didn't expect that. There's an inherent complexity that the LSP protocol allows multiple in-flight requests. I'm mainly wondering if there's an opportunity to avoid some complexity introduced by async and await (and async runtimes) because the LSP-server can give us more fine-grained control of when we want to run something on different threads to avoid concurrency-safe data structures potentially. The reason why I feel that async (or at least tokio) might not be the right fit is because most of our operations aren't IO bound and, therefore, shouldn't run in a regular tokio worker.

That said. I'm sure your PR will help me understand why the trade of more fine grained control doesn't outweigh the first-class language integration that you get with async await.

micahscopes · 2024-01-30T19:55:13Z

I didn't expect that. There's an inherent complexity that the LSP protocol allows multiple in-flight requests.

Exactly, regardless of lsp-server vs tower-lsp, it's still necessary to face this complexity! Today I'm feeling curious about using tower-lsp along with separate thread(s) for long running tasks.

I'm mainly wondering if there's an opportunity to avoid some complexity introduced by async and await (and async runtimes) because the LSP-server can give us more fine-grained control of when we want to run something on different threads to avoid concurrency-safe data structures potentially

Using async/await wouldn't necessarily preclude that kind of fine-grained control, right? For example, if you wanted some state to be on a thread and didn't want to share it, you could still use channels to communicate to/from tower-lsp handlers, couldn't you? I.e. the channel endpoints would be the only "shared state" you allow in the tower-lsp object.

But I hear your concern about the complexity of async/await/executors. Is it possible to create something maintainable and easy to fix if things go wrong? Does async/await still provide advantages even when using explicit channels and threads?

By the way, you've probably seen this already but I found this to be very helpful: https://tokio.rs/tokio/tutorial/shared-state.

morgante · 2024-01-31T04:10:29Z

In developing the Grit language server we faced some similar challenges with the async language server.

We ended up moving all long-running or compute-intensive work to a separate thread which we communicate with via channels. So far this is working nicely, but you do need to do a bit of work to make sure state messages aren't sent to the client.

MichaReiser · 2024-01-31T08:22:41Z

Hy @morgante 👋 Oh that's interesting. Do I understand it correctly that you used the async/await only as the LSP facade (to handle async IO operations) but use your own scheduler (a separate thread for now) to do the work?

That sounds very clever, making use of the async-await ergonomics where it matters but avoid it "infecting" all the downstream infrastructure and keeping some control over scheduling.

morgante · 2024-01-31T08:30:30Z

Exactly, with the caveat that we do still keep some very fast things (ex. looking up available patterns) in the main LSP thread.

micahscopes · 2024-01-31T15:13:38Z

@morgante thank you for sharing your experience, it's encouraging to hear that you've made this work. Async/await appeals to me in that it simplifies keeping track of pending requests and related responses; no need for extra data structures to keep track of pending request IDs or any of that.

So far this is working nicely, but you do need to do a bit of work to make sure state messages aren't sent to the client

Can you elaborate on the part about preventing state messages from being sent to the client? I don't understand.

morgante · 2024-02-15T07:34:03Z

Can you elaborate on the part about preventing state messages from being sent to the client? I don't understand.

Depending on how long async execution takes, documents will often have changed before the results are sent to the client. The version parameter can help with this, but it's even better to dedupe expensive in-progress work if you can and not even bother sending a diagnostic notification for an old version.

@MichaReiser

## Summary This PR introduces the `ruff_server` crate and a new `ruff server` command. `ruff_server` is a re-implementation of [`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in Rust. It brings significant performance improvements, much tighter integration with Ruff, a foundation for supporting entirely new language server features, and more! This PR is an early version of `ruff_lsp` that we're calling the **pre-release** version. Anyone is more than welcome to use it and submit bug reports for any issues they encounter - we'll have some documentation on how to set it up with a few common editors, and we'll also provide a pre-release VSCode extension for those interested. This pre-release version supports: - **Diagnostics for `.py` files** - **Quick fixes** - **Full-file formatting** - **Range formatting** - **Multiple workspace folders** - **Automatic linter/formatter configuration** - taken from any `pyproject.toml` files in the workspace. Many thanks to @MichaReiser for his [proof-of-concept work](#7262), which was important groundwork for making this PR possible. ## Architectural Decisions I've made an executive choice to go with `lsp-server` as a base framework for the LSP, in favor of `tower-lsp`. There were several reasons for this: 1. I would like to avoid `async` in our implementation. LSPs are mostly computationally bound rather than I/O bound, and `async` adds a lot of complexity to the API, while also making harder to reason about execution order. This leads into the second reason, which is... 2. Any handlers that mutate state should be blocking and run in the event loop, and the state should be lock-free. This is the approach that `rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a framework), and it gives us assurances about data mutation and execution order. `tower-lsp` doesn't support this, which has caused some [issues](ebkalderon/tower-lsp#284) around data races and out-of-order handler execution. 3. In general, I think it makes sense to have tight control over scheduling and the specifics of our implementation, in exchange for a slightly higher up-front cost of writing it ourselves. We'll be able to fine-tune it to our needs and support future LSP features without depending on an upstream maintainer. ## Test Plan The pre-release of `ruff_server` will have snapshot tests for common document editing scenarios. An expanded test suite is on the roadmap for future version of `ruff_server`.

@MichaReiser

…sh#10158)  ## Summary This PR introduces the `ruff_server` crate and a new `ruff server` command. `ruff_server` is a re-implementation of [`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in Rust. It brings significant performance improvements, much tighter integration with Ruff, a foundation for supporting entirely new language server features, and more! This PR is an early version of `ruff_lsp` that we're calling the **pre-release** version. Anyone is more than welcome to use it and submit bug reports for any issues they encounter - we'll have some documentation on how to set it up with a few common editors, and we'll also provide a pre-release VSCode extension for those interested. This pre-release version supports: - **Diagnostics for `.py` files** - **Quick fixes** - **Full-file formatting** - **Range formatting** - **Multiple workspace folders** - **Automatic linter/formatter configuration** - taken from any `pyproject.toml` files in the workspace. Many thanks to @MichaReiser for his [proof-of-concept work](astral-sh#7262), which was important groundwork for making this PR possible. ## Architectural Decisions I've made an executive choice to go with `lsp-server` as a base framework for the LSP, in favor of `tower-lsp`. There were several reasons for this: 1. I would like to avoid `async` in our implementation. LSPs are mostly computationally bound rather than I/O bound, and `async` adds a lot of complexity to the API, while also making harder to reason about execution order. This leads into the second reason, which is... 2. Any handlers that mutate state should be blocking and run in the event loop, and the state should be lock-free. This is the approach that `rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a framework), and it gives us assurances about data mutation and execution order. `tower-lsp` doesn't support this, which has caused some [issues](ebkalderon/tower-lsp#284) around data races and out-of-order handler execution. 3. In general, I think it makes sense to have tight control over scheduling and the specifics of our implementation, in exchange for a slightly higher up-front cost of writing it ourselves. We'll be able to fine-tune it to our needs and support future LSP features without depending on an upstream maintainer. ## Test Plan The pre-release of `ruff_server` will have snapshot tests for common document editing scenarios. An expanded test suite is on the roadmap for future version of `ruff_server`.

MichaReiser closed this Sep 12, 2023

MichaReiser reopened this Sep 12, 2023

MichaReiser changed the title ~~Avoid allocating in lex_decimal~~ POC: Rust LSP Sep 12, 2023

MichaReiser force-pushed the rust-based-lsp branch from 61cd0be to 56c62bc Compare September 12, 2023 05:50

MichaReiser force-pushed the rust-based-lsp branch from 56c62bc to 729a4fc Compare September 25, 2023 12:56

MichaReiser force-pushed the rust-based-lsp branch 9 times, most recently from fafee18 to 02c6ea2 Compare September 28, 2023 08:39

Add Rust based LSP

a670128

Signed-off-by: Micha Reiser <micha@reiser.io>

MichaReiser force-pushed the rust-based-lsp branch from 02c6ea2 to a670128 Compare September 28, 2023 13:02

charliermarsh mentioned this pull request Oct 27, 2023

Is there interest in moving ruff-lsp implementation into rust? astral-sh/ruff-lsp#300

Closed

MichaReiser closed this Dec 14, 2023

micahscopes mentioned this pull request Jan 29, 2024

Language server concurrency and functionality upgrades ethereum/fe#979

Open

37 tasks

T-256 mentioned this pull request Feb 5, 2024

LSP (Language Server Protocol) Support astral-sh/rye#593

Open

snowsignal mentioned this pull request Feb 29, 2024

ruff server - A new built-in LSP for Ruff, written in Rust #10158

Merged

MichaReiser deleted the rust-based-lsp branch May 8, 2024 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: Rust LSP #7262

POC: Rust LSP #7262

MichaReiser commented Sep 10, 2023 •

edited

Loading

MichaReiser commented Sep 12, 2023

codspeed-hq bot commented Sep 25, 2023 •

edited

Loading

micahscopes commented Jan 29, 2024

MichaReiser commented Jan 29, 2024

micahscopes commented Jan 29, 2024

MichaReiser commented Jan 30, 2024

micahscopes commented Jan 30, 2024 •

edited

Loading

morgante commented Jan 31, 2024

MichaReiser commented Jan 31, 2024 •

edited

Loading

morgante commented Jan 31, 2024

micahscopes commented Jan 31, 2024 •

edited

Loading

morgante commented Feb 15, 2024

POC: Rust LSP #7262

POC: Rust LSP #7262

Conversation

MichaReiser commented Sep 10, 2023 • edited Loading

Rust LSP

Getting started

Advantages

Development tools

Implemented functionality

Challenges and open questions

Tower-LSP or LSP-server

tower-lsp

lsp-server

VS Code Extension migration

Ruff binary discovery

Higher level lint and format abstractions

Concurrency

Notes

MichaReiser commented Sep 12, 2023

codspeed-hq bot commented Sep 25, 2023 • edited Loading

CodSpeed Performance Report

Merging #7262 will improve performances by 2.9%

Summary

Benchmarks breakdown

micahscopes commented Jan 29, 2024

MichaReiser commented Jan 29, 2024

micahscopes commented Jan 29, 2024

MichaReiser commented Jan 30, 2024

micahscopes commented Jan 30, 2024 • edited Loading

morgante commented Jan 31, 2024

MichaReiser commented Jan 31, 2024 • edited Loading

morgante commented Jan 31, 2024

micahscopes commented Jan 31, 2024 • edited Loading

morgante commented Feb 15, 2024

MichaReiser commented Sep 10, 2023 •

edited

Loading

Higher level `lint` and `format` abstractions

codspeed-hq bot commented Sep 25, 2023 •

edited

Loading

micahscopes commented Jan 30, 2024 •

edited

Loading

MichaReiser commented Jan 31, 2024 •

edited

Loading

micahscopes commented Jan 31, 2024 •

edited

Loading