rust-lang · alexcrichton · Feb 11, 2016 · Sep 24, 2015 · Nov 17, 2015 · Dec 15, 2015
diff --git a/text/0000-ide.md b/text/0000-ide.md
@@ -0,0 +1,302 @@
+- Feature Name: n/a
+- Start Date: 2015-10-13
+- RFC PR: (leave this empty)
+- Rust Issue: (leave this empty)
+
+# Summary
+
+This RFC describes the Rust Language Server (RLS). This is a program designed to
+service IDEs and other tools. It offers a new access point to compilation and
+APIs for getting information about a program. The RLS can be thought of as an
+alternate compiler, but internally will use the existing compiler.
+
+Using the RLS offers very low latency compilation. This allows for an IDE to
+present information based on compilation to the user as quickly as possible.
+
+
+## Requirements
+
+To be concrete about the requirements for the RLS, it should enable the
+following actions:
+
+* show compilation errors and warnings, updated as the user types,
+* code completion as the user types,
+* highlight all references to an item,
+* find all references to an item,
+* jump to definition.
+
+These requirements will be covered in more detail in later sections.
+
+
+## History note
+
+This RFC started as a more wide-ranging RFC. Some of the details have been
+scaled back to allow for more focused and incremental development.
+
+Parts of the RFC dealing with robust compilation have been removed - work here
+is ongoing and mostly doesn't require an RFC.
+
+The RLS was earlier referred to as the oracle.
+
+
+# Motivation
+
+Modern IDEs are large and complex pieces of software; creating a new one from
+scratch for Rust would be impractical. Therefore we need to work with existing
+IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality.
+These IDEs provide excellent editor and project management support out of the
+box, but know nothing about the Rust language. This information must come from
+the compiler.
+
+An important aspect of IDE support is that response times must be extremely
+quick. Users expect some feedback as they type. Running normal compilation of an
+entire project is far too slow. Furthermore, as the user is typing, the program
+will not be a valid, complete Rust program.
+
+We expect that an IDE may have its own lexer and parser. This is necessary for
+the IDE to quickly give parse errors as the user types. Editors are free to rely
+on the compiler's parsing if they prefer (the compiler will do its own parsing
+in any case). Further information (name resolution, type information, etc.) will
+be provided by the RLS.
+
+## Requirements
+
+We stated some requirements in the summary, here we'll cover more detail and the
+workflow between IDE and RLS.
+
+The RLS should be safe to use in the face of concurrent actions. For example,
+multiple requests for compilation could occur, with later requests occurring
+before earlier requests have finished. There could be multiple clients making
+requests to the RLS, some of which may mutate its data. The RLS should provide
+reliable and consistent responses. However, it is not expected that clients are
+totally isolated, e.g., if client 1 updates the program, then client 2 requests
+information about the program, client 2's response will reflect the changes made
+by client 1, even if these are not otherwise known to client 2.
+
+
+### Show compilation errors and warnings, updated as the user types
+
+The IDE will request compilation of the in-memory program. The RLS will compile
+the program and asynchronously supply the IDE with errors and warnings.
+
+### Code completion as the user types
+
+The IDE will request compilation of the in-memory program and request code-
+completion options for the cursor position. The RLS will compile the program. As
+soon as it has enough information for code-completion it will return options to
+the IDE.
+
+* The RLS should return code-completion options asynchronously to the IDE.
+  Alternatively, the RLS could block the IDE's request for options.
+* The RLS should not filter the code-completion options. For example, if the
+  user types `foo.ba` where `foo` has available fields `bar` and `qux`, it
+  should return both these fields, not just `bar`. The IDE can perform it's own
+  filtering since it might want to perform spell checking, etc. Put another way,
+  the RLS is not a code completion tool, but supplies the low-level data that a
+  code completion tool uses to provide suggestions.
+
+### Highlight all references to an item
+
+The IDE requests all references in the same file based on a position in the
+file. The RLS returns a list of spans.
+
+### Find all references to an item
+
+The IDE requests all references based on a position in the file. The RLS returns
+a list of spans.
+
+### Jump to definition
+
+The IDE requests the definition of an item based on a position in a file. The RLS
+returns a list of spans (a list is necessary since, for example, a dynamically
+dispatched trait method could be defined in multiple places).
+
+
+# Detailed design
+
+## Architecture
+
+The basic requirements for the architecture of the RLS are that it should be:
+
+* reusable by different clients (IDEs, tools, ...),
+* fast (we must provide semantic information about a program as the user types),
+* handle multi-crate programs,
+* consistent (it should handle multiple, potentially mutating, concurrent requests).
+
+The RLS will be a long running daemon process. Communication between the RLS and
+an IDE will be via IPC calls (tools (for example, Racer) will also be able to
+use the RLS as an in-process library.). The RLS will include the compiler as a
+library.
+
+The RLS has three main components - the compiler, a database, and a work queue.
+
+The RLS accepts two kinds of requests - compilation requests and queries. It
+will also push data to registered programs (generally triggered by compilation
+completing). Essentially, all communication with the RLS is asynchronous (when
+used as an in-process library, the client will be able to use synchronous
+function calls too).
+
+The work queue is used to sequentialise requests and ensure consistency of
+responses. Both compilation requests and queries are stored in the queue. Some
+compilation requests can cause earlier compilation requests to be canceled.
+Queries blocked on the earlier compilation then become blocked on the new
+request.
+
+In the future, we should move queries ahead of compilation requests where
+possible.
+
+When compilation completes, the database is updated (see below for more
+details). All queries are answered from the database. The database has data for
+the whole project, not just one crate. This also means we don't need to keep the
+compiler's data in memory.
+
+
+## Compilation
+
+The RLS is somewhat parametric in its compilation model. Theoretically, it could
+run a full compile on the requested crate, however this would be too slow in
+practice.
+
+The general procedure is that the IDE (or other client) requests that the RLS
+compile a crate. It is up to the IDE to interact with Cargo (or some other
+build system) in order to produce the correct build command and to ensure that
+any dependencies are built.
+
+Initially, the RLS will do a standard incremental compile on the specified
+crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more
+details on incremental compilation.
+
+The crate being compiled should include any modifications made in the client and
+not yet committed to a file (e.g., changes the IDE has in memory). The client
+should pass such changes to the RLS along with the compilation request.
+
+I see two ways to improve compilation times: lazy compilation and keeping the
+compiler in memory. We might also experiment with having the IDE specify which
+parts of the program have changed, rather than having the compiler compute this.
+
+### Lazy compilation
+
+With lazy compilation the IDE requests that a specific item is compiled, rather
+than the whole program. The compiler compiles this function compiling other
+items only as necessary to compile the requested item.
+
+Lazy compilation should also be incremental - an item is only compiled if
+required *and* if it has changed.
+
+Obviously, we could miss some errors with pure lazy compilation. To address this
+the RLS schedules both a lazy and a full (but still incremental) compilation.
+The advantage of this approach is that many queries scheduled after compilation
+can be performed after the lazy compilation, but before the full compilation.
+
+### Keeping the compiler in memory
+
+There are still overheads with the incremental compilation approach. We must
+startup the compiler initialising its data structures, we must parse the whole
+crate, and we must read the incremental compilation data and metadata from disk.
+
+If we can keep the compiler in memory, we avoid these costs.
+
+However, this would require some significant refactoring of the compiler. There
+is currently no way to invalidate data the compiler has already computed. It
+also becomes difficult to cancel compilation: if we receive two compile requests
+in rapid succession, we may wish to cancel the first compilation before it
+finishes, since it will be wasted work. This is currently easy - the compilation
+process is killed and all data released. However, if we want to keep the
+compiler in memory we must invalidate some data and ensure the compiler is in a
+consistent state.
+
+
+### Compilation output
+
+Once compilation is finished, the RLS's database must be updated. Errors and
+warnings produced by the compiler are stored in the database. Information from
+name resolution and type checking is stored in the database (exactly which
+information will grow with time). The analysis information will be provided by
+the save-analysis API.
+
+The compiler will also provide data on which (old) code has been invalidated.
+Any information (including errors) in the database concerning this code is
+removed before the new data is inserted.
+
+
+### Multiple crates
+
+The RLS does not track dependencies, nor much crate information. However, it
+will be asked to compile many crates and it will keep track of which crate data
+belongs to. It will also keep track of which crates belong to a single program
+and will not share data between programs, even if the same crate is shared. This
+helps avoid versioning issues.
+
+
+## Versioning
+
+The RLS will be released using the same train model as Rust. A version of the
+RLS is pinned to a specific version of Rust. If users want to operate with
+multiple versions, they will need multiple versions of the RLS (I hope we can
+extend multirust/rustup.rs to handle the RLS as well as Rust).
+
+
+# Drawbacks
+
+It's a lot of work. But better we do it once than each IDE doing it themselves,
+or having sub-standard IDE support.
+
+
+# Alternatives
+
+The big design choice here is using a database rather than the compiler's data
+structures. The primary motivation for this is the 'find all references'
+requirement. References could be in multiple crates, so we would need to reload
+incremental compilation data (which must include the serialised MIR, or
+something equivalent) for all crates, then search this data for matching
+identifiers. Assuming the serialisation format is not too complex, this should
+be possible in a reasonable amount of time. Since identifiers might be in
+function bodies, we can't rely on metadata.
+
+This is a reasonable alternative, and may be simpler than the database approach.
+However, it is not planned to output this data in the near future (the initial
+plan for incremental compilation is to not store information required to re-
+check function bodies). This approach might be too slow for very large projects,
+we might wish to do searches in the future that cannot be answered without doing
+the equivalent of a database join, and the database simplifies questions about
+concurrent accesses.
+
+We could only provide the RLS as a library, rather than providing an API via
+IPC. An IPC interface allows a single instance of the RLS to service multiple
+programs, is language-agnostic, and allows for easy asynchronous-ness between
+the RLS and its clients. It also provides isolation - a panic in the RLS will
+not cause the IDE to crash, not can a long-running operation delay the IDE. Most
+of these advantages could be captured using threads. However, the cost of
+implementing an IPC interface is fairly low and means less effort for clients,
+so it seems worthwhile to provide.
+
+Extending this idea, we could do less than the RLS - provide a high-level
+library API for the Rust compiler and let other projects do the rest. In
+particular, Racer does an excellent job at providing the information the RLS
+would provide without much information from the compiler. This is certainly less
+work for the compiler team and more flexible for clients. On the other hand, it
+means more work for clients and possible fragmentation. Duplicated effort means
+that different clients will not benefit from each other's innovations.
+
+The RLS could do more - actually perform some of the processing tasks usually
+done by IDEs (such as editing source code) or other tools (refactoring,
+reformating, etc.).
+
+
+# Unresolved questions
+
+A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I
+understand) no efficient way to convert between byte counts in these systems.
+I'm not sure how to address this. It might require the RLS to be able to operate
+in UTF16 mode. This is only a problem with byte offsets in spans, not with
+row/column data (the RLS will supply both). It may be possible for Visual Studio
+to just use the row/column data, or convert inefficiently to UTF16. I guess the
+question comes down to should this conversion be done in the RLS or the client.
+I think we should start assuming the client, and perhaps adjust course later.
+
+What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's
+platform-independent and used in many similar pieces of software. On the other
+hand it is heavyweight and requires pulling in large libraries, and requires
+some attention to security issues. Alternatives are some kind of custom
+prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it
+has been proven in similar situations.