-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Server Index Format #623
Comments
@dbaeumer Not sure if I'm misinterpreting something here, but correct to say that this is something for LSP clients to implement and not servers to implement, yes? |
I understood it as something dumped by servers, and then used by clients afterwards, so both would have their part to implement |
Hm...good point. I had interpreted it as more of a caching system for the LSP client. However, I guess the first sentence should've made it clear to me... :(
Well then...! |
@rcjsuen no the idea is that this is dumped either by the server or a separate tool. As mentioned in the spec I have already written a tools for TypeScript and an generic extension that serves the dump via LSP to any kind of LSP client. I will make these open source the next couple of days. |
So if I understand this correctly, this is supposed to be a library for helping language server implementers solve the problem of maintaining an index? If so, why does it need a specification as opposed to simply a documentation of the code? What are the reuse cases? |
Looks like I was pretty bad in explaining it: the goal is that we can produce an index to answer LSP requests for read only workspaces without firing up a language server specific to the programming language. There will be one generic language server that can serve the index. Furthermore the index will allow to relate symbols across repositories. See the demo here where Jonathan navigates from the use of |
Sorry, still not sure I got it. Is it some sort of cache middleware for language servers? |
I am glad the LSP protocol also includes some technical proposals, middleware or intermediary formats to allow combination of different language servers. So this index format will come with an implementation of a language server able to process multiple indexes to return results? How is this "composite index-based language server" expected to know which LS to retrive indexes from or how to get indexes? |
From where did you draw this would be supported? It would indeed be useful if language servers could access a common cross-languages index.
I was not implying it should not, but wanted to understand why and how it depends on the LSP (technically). |
Some clarifications: the LSIF will not be part of the protocol itself since it is not a protocol. What might be part of the protocol are requests to ask a server to dump it's state. Why did we decide to put it here: the LSIF is based on LSP data types. The questions that are answerable by a dump are typical LSP requests useful on a read only workspace (for example goto definition, find all references). Yes, we have developed a generic language server that can read in many indices and serve LSP requests on them. So it can serve a C# index in parallel to a TS index. I will make the TS index generator and the generic language server with an VS Code extension public soon. Will add a message here when available. |
Here we go:
|
I'm still a bit hazy on the motivation here: are you trying to solve a latency problem? Also, what's the use case for "read-only workpaces"? |
It is more about repositories and published version. In my projects I usually have dependencies to many other npm packages which I depend on on a certain version. To be able to navigate and browser them there is no need to spin a whole language server (no need for code complete, signature help, ...). If there would be an index it would be relatively cheap to serve these and to support navigating to them even without cloning the repository locally. |
Shown here to navigate from one source base to |
Ahh...precomputed indexes. We've been thinking about this for jdt for a long time :-) |
I would love to use this to precompute indexes in CI. For devs working in large repos this would be very helpful. |
@tsmaeder if you look at the specification then this is basically split into two passes. The LS or the language tool will generate monikers specific to the tool. A linker tool will make them package manager specific by consulting other information. We even split this for TS / npm to demonstrate that embedding this into the Language tool is problematic. So the idea is more one of a compiler and then a symbol linker. |
Is the JSON graph format built on any standard JSON graph representation? I would assume there are already existing formats for that which would be nice to built on, since there may be existing tools that can read/generate them (e.g. save in database, query, visualise, build, etc) |
@felixfbecker yes and no: it uses the same property names like |
I still don't understand how the index-based LS and the 'real' LS would work together. |
That depends on how we would at the end decide how the indexer is run. We haven't made any decision on this. Options are:
See also: microsoft/lsif-node#6 I am fully open for ideas here. |
With this approach, could we extend LSIF to be a write-through cache? This would support incremental lazy LSIF cache filling which could be merged with asynchronous full "dumps" over time. If you squint hard enough, this would look similar to the Lambda Architecture, where incoming LSP requests that lazily fill the LSIF cache would map to the Streaming and Serving layers and the asynchronous full "dumps" would map to the Batch layer. |
Addendum: In theory, from what I can see, even a generalized LSIF caching proxy for different language servers would work. So there wouldn't be a need to change each language server individually. |
Might be a good idea to take a look at kythe schema which seems to serve a similar-ish purpose. The primary difference is that |
@matklad we looked at kythe and other symbol databases and then purposely decided not to use one. Mainly for the reasons you pointed out. |
@dbaeumer @matklad I put together a quick list of first impression differences between LSIF and Kythe: https://gist.github.com/robinp/76f9d3d91387da5162f773895d4e1d15. Disclaimer: I don't know much about LSP/LSIF other than browsing the spec and the query docs a bit, and somewhat biased towards Kythe due to previous work with it, so offset that. |
One usage case of LSIF for Java is that before the current language server initialized, which might take times, the client can use the LSI to unblock some smartness scenarios immediately after the user open the workspace. |
This can be very useful for the warm load case. The language server knowledge can be persisted with LSIF in the previous session. And the knowledge can be used to enable basic language server features like symbol navigation in the new session before the actual language server finishes loading the project. @fbricon This is something we would like to try for the Java language server. |
@yaohaizh nice use case. |
@robinp thanks a lot for the comparison. Some first feedback to the feedback:
You might be also interested in https://github.com/Microsoft/lsif-typescript/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc |
One problem I see with LSIF is that some queries depend on knowing the whole program. Imagine you index a maven project that declares an interface with a method foo(). When we try to find implementers of "foo", the anwer depends on what the user has open in his workspace. It's even worse: the language server might determine that a particular declaration of "foo" is not an implementation (maybe because it's from a different version of the project, not the one the code in the workspace compiles against. |
@dbaeumer Thanks! Sounds fair. Re |
@tsmaeder we discussed this lately and one idea was that LSP adds support to resolve a moniker and that we could have a |
I have now implemented a program which generates LSIF indices for Haskell files. I have two main concerns so far about the format.
I also don't understand the bit in the specification about imports/exports but there's another issue #680 about that already. |
@dbaeumer this is an awesome for LSP! The biggest concern I have is the numerical value of vertex id. it has several limitations as I can see
Also, I would suggest that there should be some dump options, e.g. we may just want to index references information |
@zfy0701 the protocol defines the id as |
@mpickering I agree that the current version is to verbose and I have an item for this. It is microsoft/lsif-node#4. I started to prototype a compress JSON format that is fully array based and self describing. Will ping if I have something to comment on. Regarding composition: the idea is that projects can be parsed independently and that import / export results can be used to link symbols between them. I will continue on #680 and look into implementing that for TypeScript. |
@jdneo see the discussion here: microsoft/lsif-node#10 |
I will close the issue now that we have |
The purpose of the Language Server Index Format (LSIF) is it to define a standard format for language servers or other programming tools to dump their knowledge about a workspace. This dump can later be used to answer language server LSP requests for the same workspace without running the language server itself. Since much of the information would be invalidated by a change to the workspace, the dumped information typically excludes requests used when mutating a document. So, for example, the result of a code complete request is typically not part of such a dump.
A first draft of a specification is available here
The text was updated successfully, but these errors were encountered: