-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust Language Server (IDE support) #1317
Changes from all commits
6b0b771
894fc45
1615236
3dd97de
2f9a2a1
37d72be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,302 @@ | ||
- Feature Name: n/a | ||
- Start Date: 2015-10-13 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
This RFC describes the Rust Language Server (RLS). This is a program designed to | ||
service IDEs and other tools. It offers a new access point to compilation and | ||
APIs for getting information about a program. The RLS can be thought of as an | ||
alternate compiler, but internally will use the existing compiler. | ||
|
||
Using the RLS offers very low latency compilation. This allows for an IDE to | ||
present information based on compilation to the user as quickly as possible. | ||
|
||
|
||
## Requirements | ||
|
||
To be concrete about the requirements for the RLS, it should enable the | ||
following actions: | ||
|
||
* show compilation errors and warnings, updated as the user types, | ||
* code completion as the user types, | ||
* highlight all references to an item, | ||
* find all references to an item, | ||
* jump to definition. | ||
|
||
These requirements will be covered in more detail in later sections. | ||
|
||
|
||
## History note | ||
|
||
This RFC started as a more wide-ranging RFC. Some of the details have been | ||
scaled back to allow for more focused and incremental development. | ||
|
||
Parts of the RFC dealing with robust compilation have been removed - work here | ||
is ongoing and mostly doesn't require an RFC. | ||
|
||
The RLS was earlier referred to as the oracle. | ||
|
||
|
||
# Motivation | ||
|
||
Modern IDEs are large and complex pieces of software; creating a new one from | ||
scratch for Rust would be impractical. Therefore we need to work with existing | ||
IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality. | ||
These IDEs provide excellent editor and project management support out of the | ||
box, but know nothing about the Rust language. This information must come from | ||
the compiler. | ||
|
||
An important aspect of IDE support is that response times must be extremely | ||
quick. Users expect some feedback as they type. Running normal compilation of an | ||
entire project is far too slow. Furthermore, as the user is typing, the program | ||
will not be a valid, complete Rust program. | ||
|
||
We expect that an IDE may have its own lexer and parser. This is necessary for | ||
the IDE to quickly give parse errors as the user types. Editors are free to rely | ||
on the compiler's parsing if they prefer (the compiler will do its own parsing | ||
in any case). Further information (name resolution, type information, etc.) will | ||
be provided by the RLS. | ||
|
||
## Requirements | ||
|
||
We stated some requirements in the summary, here we'll cover more detail and the | ||
workflow between IDE and RLS. | ||
|
||
The RLS should be safe to use in the face of concurrent actions. For example, | ||
multiple requests for compilation could occur, with later requests occurring | ||
before earlier requests have finished. There could be multiple clients making | ||
requests to the RLS, some of which may mutate its data. The RLS should provide | ||
reliable and consistent responses. However, it is not expected that clients are | ||
totally isolated, e.g., if client 1 updates the program, then client 2 requests | ||
information about the program, client 2's response will reflect the changes made | ||
by client 1, even if these are not otherwise known to client 2. | ||
|
||
|
||
### Show compilation errors and warnings, updated as the user types | ||
|
||
The IDE will request compilation of the in-memory program. The RLS will compile | ||
the program and asynchronously supply the IDE with errors and warnings. | ||
|
||
### Code completion as the user types | ||
|
||
The IDE will request compilation of the in-memory program and request code- | ||
completion options for the cursor position. The RLS will compile the program. As | ||
soon as it has enough information for code-completion it will return options to | ||
the IDE. | ||
|
||
* The RLS should return code-completion options asynchronously to the IDE. | ||
Alternatively, the RLS could block the IDE's request for options. | ||
* The RLS should not filter the code-completion options. For example, if the | ||
user types `foo.ba` where `foo` has available fields `bar` and `qux`, it | ||
should return both these fields, not just `bar`. The IDE can perform it's own | ||
filtering since it might want to perform spell checking, etc. Put another way, | ||
the RLS is not a code completion tool, but supplies the low-level data that a | ||
code completion tool uses to provide suggestions. | ||
|
||
### Highlight all references to an item | ||
|
||
The IDE requests all references in the same file based on a position in the | ||
file. The RLS returns a list of spans. | ||
|
||
### Find all references to an item | ||
|
||
The IDE requests all references based on a position in the file. The RLS returns | ||
a list of spans. | ||
|
||
### Jump to definition | ||
|
||
The IDE requests the definition of an item based on a position in a file. The RLS | ||
returns a list of spans (a list is necessary since, for example, a dynamically | ||
dispatched trait method could be defined in multiple places). | ||
|
||
|
||
# Detailed design | ||
|
||
## Architecture | ||
|
||
The basic requirements for the architecture of the RLS are that it should be: | ||
|
||
* reusable by different clients (IDEs, tools, ...), | ||
* fast (we must provide semantic information about a program as the user types), | ||
* handle multi-crate programs, | ||
* consistent (it should handle multiple, potentially mutating, concurrent requests). | ||
|
||
The RLS will be a long running daemon process. Communication between the RLS and | ||
an IDE will be via IPC calls (tools (for example, Racer) will also be able to | ||
use the RLS as an in-process library.). The RLS will include the compiler as a | ||
library. | ||
|
||
The RLS has three main components - the compiler, a database, and a work queue. | ||
|
||
The RLS accepts two kinds of requests - compilation requests and queries. It | ||
will also push data to registered programs (generally triggered by compilation | ||
completing). Essentially, all communication with the RLS is asynchronous (when | ||
used as an in-process library, the client will be able to use synchronous | ||
function calls too). | ||
|
||
The work queue is used to sequentialise requests and ensure consistency of | ||
responses. Both compilation requests and queries are stored in the queue. Some | ||
compilation requests can cause earlier compilation requests to be canceled. | ||
Queries blocked on the earlier compilation then become blocked on the new | ||
request. | ||
|
||
In the future, we should move queries ahead of compilation requests where | ||
possible. | ||
|
||
When compilation completes, the database is updated (see below for more | ||
details). All queries are answered from the database. The database has data for | ||
the whole project, not just one crate. This also means we don't need to keep the | ||
compiler's data in memory. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is the input a span here? I would expect an offset. Would I have to find the span of the entire word I want the definition of, or is a partial span (or even an empty span) a valid input? (Note that the same remark applies to all the subsequent API functions) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm assuming that the IDE has a tokeniser and therefore already knows the span of the identifier (even simple editors usually have a tokeniser in order to implement syntax highlighting). On the other hand, I suppose that from the oracle's point of view it is as easy to identify the identifier by a single position as it is a span, so it might be as well to take a single position. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The hand written tokenizer on the plugin side will certainly not be as accurate as the oracle's one. I'd rather delegate to the oracle as much as possible. |
||
|
||
## Compilation | ||
|
||
The RLS is somewhat parametric in its compilation model. Theoretically, it could | ||
run a full compile on the requested crate, however this would be too slow in | ||
practice. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The output should also tell the "kind" of each reference found. For example, if we want to find the references of a function declared in a trait, the output could be either a "method call" kind, or a "definition in a trait impl" kind, or a "function as a value" kind. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "reference data" includes the kind of definition, see lines 299, 300 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The term used in the the previous section was "definition data". It is not obvious that "reference data" designates the "kind" of references. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll try and polish the text here. |
||
The general procedure is that the IDE (or other client) requests that the RLS | ||
compile a crate. It is up to the IDE to interact with Cargo (or some other | ||
build system) in order to produce the correct build command and to ensure that | ||
any dependencies are built. | ||
|
||
Initially, the RLS will do a standard incremental compile on the specified | ||
crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more | ||
details on incremental compilation. | ||
|
||
The crate being compiled should include any modifications made in the client and | ||
not yet committed to a file (e.g., changes the IDE has in memory). The client | ||
should pass such changes to the RLS along with the compilation request. | ||
|
||
I see two ways to improve compilation times: lazy compilation and keeping the | ||
compiler in memory. We might also experiment with having the IDE specify which | ||
parts of the program have changed, rather than having the compiler compute this. | ||
|
||
### Lazy compilation | ||
|
||
With lazy compilation the IDE requests that a specific item is compiled, rather | ||
than the whole program. The compiler compiles this function compiling other | ||
items only as necessary to compile the requested item. | ||
|
||
Lazy compilation should also be incremental - an item is only compiled if | ||
required *and* if it has changed. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The search for identifier should also take as an input the scope of the search. QtCreator's Locator lets the user search for a symbol in the current opened file, and It's one of the Locator function I use the most. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good idea, I'll add it. Note that the IDE is expected to present an interface over this functionality, and plain text search should be done entirely in the IDE, the only search the oracle helps with is finding uses of a particular identifier (i.e., semantic search). Defining a scope to search over would still be useful though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was indeed talking about symbols, not full text searches. The Locator will give top level items only. |
||
Obviously, we could miss some errors with pure lazy compilation. To address this | ||
the RLS schedules both a lazy and a full (but still incremental) compilation. | ||
The advantage of this approach is that many queries scheduled after compilation | ||
can be performed after the lazy compilation, but before the full compilation. | ||
|
||
### Keeping the compiler in memory | ||
|
||
There are still overheads with the incremental compilation approach. We must | ||
startup the compiler initialising its data structures, we must parse the whole | ||
crate, and we must read the incremental compilation data and metadata from disk. | ||
|
||
If we can keep the compiler in memory, we avoid these costs. | ||
|
||
However, this would require some significant refactoring of the compiler. There | ||
is currently no way to invalidate data the compiler has already computed. It | ||
also becomes difficult to cancel compilation: if we receive two compile requests | ||
in rapid succession, we may wish to cancel the first compilation before it | ||
finishes, since it will be wasted work. This is currently easy - the compilation | ||
process is killed and all data released. However, if we want to keep the | ||
compiler in memory we must invalidate some data and ensure the compiler is in a | ||
consistent state. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add another function to the API: a way to search among the documentation. QtCreator's Locator can search in the available documentation, and display the html doc of the matches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a text search of the docs, or does it look up documentation for a particular function (or whatever)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure about the specifics. With a Qt project, this Locator feature matches either a symbol name, or a word that appears inside a title of the generated html doc (they are structured docs). This is clearly not a critical feature, it may not belong to an initial RFC. |
||
|
||
### Compilation output | ||
|
||
Once compilation is finished, the RLS's database must be updated. Errors and | ||
warnings produced by the compiler are stored in the database. Information from | ||
name resolution and type checking is stored in the database (exactly which | ||
information will grow with time). The analysis information will be provided by | ||
the save-analysis API. | ||
|
||
The compiler will also provide data on which (old) code has been invalidated. | ||
Any information (including errors) in the database concerning this code is | ||
removed before the new data is inserted. | ||
|
||
|
||
### Multiple crates | ||
|
||
The RLS does not track dependencies, nor much crate information. However, it | ||
will be asked to compile many crates and it will keep track of which crate data | ||
belongs to. It will also keep track of which crates belong to a single program | ||
and will not share data between programs, even if the same crate is shared. This | ||
helps avoid versioning issues. | ||
|
||
|
||
## Versioning | ||
|
||
The RLS will be released using the same train model as Rust. A version of the | ||
RLS is pinned to a specific version of Rust. If users want to operate with | ||
multiple versions, they will need multiple versions of the RLS (I hope we can | ||
extend multirust/rustup.rs to handle the RLS as well as Rust). | ||
|
||
|
||
# Drawbacks | ||
|
||
It's a lot of work. But better we do it once than each IDE doing it themselves, | ||
or having sub-standard IDE support. | ||
|
||
|
||
# Alternatives | ||
|
||
The big design choice here is using a database rather than the compiler's data | ||
structures. The primary motivation for this is the 'find all references' | ||
requirement. References could be in multiple crates, so we would need to reload | ||
incremental compilation data (which must include the serialised MIR, or | ||
something equivalent) for all crates, then search this data for matching | ||
identifiers. Assuming the serialisation format is not too complex, this should | ||
be possible in a reasonable amount of time. Since identifiers might be in | ||
function bodies, we can't rely on metadata. | ||
|
||
This is a reasonable alternative, and may be simpler than the database approach. | ||
However, it is not planned to output this data in the near future (the initial | ||
plan for incremental compilation is to not store information required to re- | ||
check function bodies). This approach might be too slow for very large projects, | ||
we might wish to do searches in the future that cannot be answered without doing | ||
the equivalent of a database join, and the database simplifies questions about | ||
concurrent accesses. | ||
|
||
We could only provide the RLS as a library, rather than providing an API via | ||
IPC. An IPC interface allows a single instance of the RLS to service multiple | ||
programs, is language-agnostic, and allows for easy asynchronous-ness between | ||
the RLS and its clients. It also provides isolation - a panic in the RLS will | ||
not cause the IDE to crash, not can a long-running operation delay the IDE. Most | ||
of these advantages could be captured using threads. However, the cost of | ||
implementing an IPC interface is fairly low and means less effort for clients, | ||
so it seems worthwhile to provide. | ||
|
||
Extending this idea, we could do less than the RLS - provide a high-level | ||
library API for the Rust compiler and let other projects do the rest. In | ||
particular, Racer does an excellent job at providing the information the RLS | ||
would provide without much information from the compiler. This is certainly less | ||
work for the compiler team and more flexible for clients. On the other hand, it | ||
means more work for clients and possible fragmentation. Duplicated effort means | ||
that different clients will not benefit from each other's innovations. | ||
|
||
The RLS could do more - actually perform some of the processing tasks usually | ||
done by IDEs (such as editing source code) or other tools (refactoring, | ||
reformating, etc.). | ||
|
||
|
||
# Unresolved questions | ||
|
||
A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I | ||
understand) no efficient way to convert between byte counts in these systems. | ||
I'm not sure how to address this. It might require the RLS to be able to operate | ||
in UTF16 mode. This is only a problem with byte offsets in spans, not with | ||
row/column data (the RLS will supply both). It may be possible for Visual Studio | ||
to just use the row/column data, or convert inefficiently to UTF16. I guess the | ||
question comes down to should this conversion be done in the RLS or the client. | ||
I think we should start assuming the client, and perhaps adjust course later. | ||
|
||
What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's | ||
platform-independent and used in many similar pieces of software. On the other | ||
hand it is heavyweight and requires pulling in large libraries, and requires | ||
some attention to security issues. Alternatives are some kind of custom | ||
prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it | ||
has been proven in similar situations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is not a detailed API, but I dont see how this works when there are multiple spans to invalidate.Do all spans must be computed based on the initial content of the file? It sounds like it could be a bit painful to compute it from the plugin side. Maybe a simpler API could live alongside this one, that takes the full content of the file and that lets the oracle compute the differences based on its current state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spans are relative to the last update passed to the oracle. I don't think that should be hard to compute - the plugin just has to keep track of any text deleted or edited since the last update call.
The trouble with diff'ing before and after snapshots is that it is hard to do well, and so we'd end up making mistakes or overestimating the invalidated region. I imagine it is not super-cheap either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I think you are right and the proposed design is best.