Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an Optional CargoTargetSpec-like Data to rust-project.json #15892

Open
davidbarsky opened this issue Nov 14, 2023 · 5 comments
Open

Add an Optional CargoTargetSpec-like Data to rust-project.json #15892

davidbarsky opened this issue Nov 14, 2023 · 5 comments
Labels
A-rust-project rust-project.json related issues C-feature Category: feature request

Comments

@davidbarsky
Copy link
Contributor

davidbarsky commented Nov 14, 2023

Motivations

Hi! I'm writing this issue not entirely convinced that it's the right approach, but I figured I'd at least open an issue for discussion/feedback. Anyways: We've added two fields to the rust-project.json format in the rust-analyzer/buck integration, which have been really useful. They are:

  • a build_file corresponding to the BUCK file defining the crate(s), serving the same function that a ManifestPath does today in rust-analyzer.
  • A target label corresponding to the build system's name of the crate (details: https://buck2.build/docs/concepts/glossary/#target-label), which is roughly a Buck/Bazel-flavored URI for a build target or a more user-visible version the package_flag method in crates/project-mode/src/cargo_workspace/CargoWorkspace.

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies (200-300 milliseconds at the average, with some substantially higher latencies at the tail) in what users would otherwise expect to be a near-instant response. Similar operations in rust-analyzer are in the low single-digit milliseconds, to its immense credit!

I decided to open this issue to propose upstreaming these changes when I was working on adding generic runnable support1 and realized that a decent chunk of the information needed to successfully create a runnable (specifically, the manifest file) could be very easily added to a CargoTargetSpec via a rust-project.json, enabling a set of long-tail features and functionality that I've wanted to tackle in a Buck context but was blocked by either too-high latencies or rust-analyzer's lack of visibility into this information. These features include:

  • The generic runnable support I mentioned above.
  • Exposing a dependency explorer to rust-project.json-based projects, which require ManifestPath (c.f., Add ManifestPath to CrateData; update dependency view to use ManifestPath #14931)
  • With the knowledge that the current flycheck infrastructure is a hack (and the $saved_file interpolation is even more so), it might be feasible to replace that with building a specific CargoTargetSpec::package, which should have benefits for larger Cargo workspaces as well.
  • This has (weirdly!) bothered me for a while, but it's be nice if the rust-analyzer status response showed workspace roots not as the folder the user has open, but rather, the manifest path of each indexed project/workspace.

Approach

I'd like to add an optional TargetSpec (renamed from CargoTargetSpec) to project_model/src/project_json/Crate, where any build system that opts into providing a TargetSpec would be required to provide all information that TargetSpec expects. My goal for providing this constraint is to minimize maintenance costs for rust-analyzer.

Alternatives/Caveats.

I worry about proposing this feature because I worry:

  • it might be an unreasonable maintenance burden on rust-analyzer.
  • it is a layering violation of the design goals of rust-project.json (in that the rust-project.json format is the lowered version of a cargo graph, and this would be introducing something that is conceptually adjacent to it.)

Footnotes

  1. Note that the current state of runnables can largely work if this line is commented out and the rust-buck-companion extension queries the owner of the file in question, but that's not an approach that I'd like to take because latencies of querying a build system are too high.

@davidbarsky davidbarsky added the C-feature Category: feature request label Nov 14, 2023
@Veykril Veykril added the A-rust-project rust-project.json related issues label Nov 14, 2023
@matklad
Copy link
Member

matklad commented Nov 29, 2023

@davidbarsky could you expand a little bit on what these extra fields actually do? This is out of my mind's L1 cache, so it's not immediately obvious to me how they could be used.

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies

I am not entirely sure what this talks about (again, this is because at this point I remeber nothing :) ), but if this is about "what crate does foo.rs belong too" functionality (which is the first step of any analysis), that should be handled by source roots. See ProjectWorkspace::to_roots.

that a decent chunk of the information needed to successfully create a runnable

For this one in particular, I think we could do something super general, like in projec.json you could specify runnable_info: any for any target, and rust-analyzer then will just transfer this info as is over the LSP protocol for the frontent to deal with. If we want to re-use existing handling in the VS Code extension, we could be specific, and add somethign like

runnables: {
   "check": ["buck", "check", "my-crate-name"],
   "run": ["buck", "run", "my-crate-name"],
   "test": ["buck", "test", "my-crate-name", "--"],
}

where keys are a fixed set of commands with standrad semantics, and values are command-lines. note that I still believe that integrating that into flycheck would be fundamentally wrong, but if it's easy to do transparently, then we could do it (like, flycheck itself is fundamentally wrong, but we are still having it, cause its useful).

Exposing a dependency explorer

👍 I think adding extra "display" information would be useful. Though, the danger here is mixing up things which are display only (and can be approximate fuzzy things, because human would sort them out) with things which have specific semantics (and where any fuzziness would confuse computers). We actually already have display_name in project.json for this purpose, and I think it would be fine to add display package or some such. Or, indeed, a path to manifest file, as long as it is DisplayPath(String), and not an actual path, as, I think, rust-analyzer shouldn't be interpreting that in any way.

@davidbarsky
Copy link
Contributor Author

@matklad Thanks so much for taking the time!

could you expand a little bit on what these extra fields actually do? This is out of my mind's L1 cache, so it's not immediately obvious to me how they could be used.

We've added these fields because determining the ManifestPath and the owning targets on the fly by querying buck adds some non-trivial, perceptible latencies

I am not entirely sure what this talks about (again, this is because at this point I remeber nothing :) ), but if this is about "what crate does foo.rs belong too" functionality (which is the first step of any analysis), that should be handled by source roots. See ProjectWorkspace::to_roots.

Sure! To back up, neither of those two fields—BuckExtensions::build_file or BuckExtensions::label—are passed to rust-analyzer as part of rust-analyzer.linkedProjects (or, more specifically, the field as it exists in InitializeParams), but they have pretty useful in our little companion server/daemon (entrypoint here) that does target discovery/reloading of Rust crates in Buck. Put differently, it allows people to open an arbitrary Rust file in a Buck repo and get a functioning rust-analyzer pretty quickly without needing to manually generate a rust-project.json ahead of time.

I'll also quickly explain how we implemented the near-Cargo-like experience with Buck, since I think it provides some useful background context/demonstrate the value that this has for us, but feel to skip over this:

  • Before rust-analyzer-the-server has started, the rust-analyzer VS Code extension calls a companion extension to determine the owning targets of any open Rust file. That (minimal, but temporarily closed source) extension then calls the rust-project daemon to find and resolve the owning build targets1, eventually returning with a rust-project.json. We use BuckExtensions::label for progress reporting while waiting on Buck.
  • To support a user navigating to a different file after the initial step, we carry feature: Create UnindexedProject notification to be sent to the client #15863 as a patch to our internal release of rust-analyzer. The approach I took in that PR boils down to checking in handle_did_open_text_document whether the file that just got opened is a part of the indexed crate graph. If it's not, we notify the client and kick off the aforementioned "find owning build targets" functionality in the companion.
    • We previously had people run a manual command to add a different file, but basically nobody ever did that, which, fair.
  • In the case where the user saves a BUCK/TARGETS file (for any unaware readers: this is equivalent to adding a dependency to Cargo.toml), we run this handler and return an updated rust-project.json to rust-analyzer 2. Both BuckExtensions::build_file and BuckExtensions::label are used in the daemon. The former is used to register for save notifications from the client while the latter is used for progress reporting/quickly knowing which target to rebuild.

Notably, rust-analyzer did not not need to be aware of any new fields in order for us to implement this functionality.

For this one in particular, I think we could do something super general, like in projec.json you could specify runnable_info: any for any target, and rust-analyzer then will just transfer this info as is over the LSP protocol for the frontent to deal with. If we want to re-use existing handling in the VS Code extension, we could be specific, and add somethign like

runnables: {
   "check": ["buck", "check", "my-crate-name"],
   "run": ["buck", "run", "my-crate-name"],
   "test": ["buck", "test", "my-crate-name", "--"],
}

where keys are a fixed set of commands with standrad semantics, and values are command-lines.

I hacked something together by replacing the existing Cargo-based runnable functionality in rust-analyzer with Buck-based runnables. I didn't push it anywhere, but it works weirdly well. I didn't consider adding the runnables to the Crate in rust-project.json—if I'm parsing you correctly—but that's a pretty clever solution, honestly.

note that I still believe that integrating that into flycheck would be fundamentally wrong, but if it's easy to do transparently, then we could do it (like, flycheck itself is fundamentally wrong, but we are still having it, cause its useful).

You know, I've absolutely come around on flycheck being a hack for a bunch of reasons, but being able to construct a flycheck command using the crate name and/or the target label is pretty useful. At least for Cargo-based users, this would allow for per-crate flychecks, which I've personally wanted in tracing.

👍 I think adding extra "display" information would be useful. Though, the danger here is mixing up things which are display only (and can be approximate fuzzy things, because human would sort them out) with things which have specific semantics (and where any fuzziness would confuse computers). We actually already have display_name in project.json for this purpose, and I think it would be fine to add display package or some such. Or, indeed, a path to manifest file, as long as it is DisplayPath(String), and not an actual path, as, I think, rust-analyzer shouldn't be interpreting that in any way.

Yeah, I've avoided putting anything too load bearing in the "display" information. In this case though, I think that I'd it be valuable to do both (or at the very least, derive display information from the semantically load-bearing information) because there are a few spots where manifest path is used around rust-analyzer in a meaningful way.

Footnotes

  1. Yes, this is a LSP server, and I know it's weird, but it works. We might change this to be just plain JSON.

  2. rust-analyzer was way faster at reindexing a project after modifying a rust-project.json than I thought it would be. This exceeded already high expectations for rust-analyzer.

@P1n3appl3
Copy link

P1n3appl3 commented Jan 31, 2024

On Fuchsia we took a similar approach to extending rust-project.json though we didn't go nearly as far with the integration as what you've described with the companion extension. Here's some extra context in case it's helpful:

We primarily use GN/Ninja for our build system, and GN generates a rust-project.json with an additional label field in crate entries. It also maintains a mapping of source files to build targets (currently separate from rust-project.json, but could easily be combined) so that given a set of source files it can run a check-build on all the "affected" targets.

We currently don't have any use for linking rust-project.json crates to their BUILD.gn "manifest" files, but it seems reasonable to provide and we could easily update GN to do so. Overall #16135 looks interesting and we'd get some value out of TargetSpec being stabilized.

A concern I have with TargetSpec over a simple $saved_file placeholder is that today we already have the logic to do that source -> target(s) mapping outside of rust-analyzer, but if flycheck_command were a field on each entry with no info about which specific file was saved then we'd run into issues where rust-analyzer doesn't know which rust-project.json entries to actually run a flycheck on. I could see this being an issue with any non-cargo build system in a large project where it's easy to write some_flycheck_wrapper.sh that accepts $changed_file and does the right thing, but quite hard to get GN/ninja/meson/cmake/etc to spit out a rust-project.json with TargetSpecs that make it all work, especially in cases where the project doesn't own their own build system.

We're also mildly concerned about adding too much to rust-project.json, ours are around 20MB today and we'd want to make sure we measure any changes to make sure they don't drastically impact startup time.

@davidbarsky
Copy link
Contributor Author

Thanks for responding!

We currently don't have any use for linking rust-project.json crates to their BUILD.gn "manifest" files, but it seems reasonable to provide and we could easily update GN to do so. Overall #16135 looks interesting and we'd get some value out of TargetSpec being stabilized.

The primary utility of such a field (at least, for me) is to reload rust-analyzer whenever the corresponding manifest file changes, but I can understand that not being particularly useful for you.

A concern I have with TargetSpec over a simple $saved_file placeholder is that today we already have the logic to do that source -> target(s) mapping outside of rust-analyzer, but if flycheck_command were a field on each entry with no info about which specific file was saved then we'd run into issues where rust-analyzer doesn't know which rust-project.json entries to actually run a flycheck on.

Gotcha: at least under Buck, it's decently easy/reliable to figure out what crates owns what file, but I don't see any reason why $saved_file (or something similar) can't also be supported.

We're also mildly concerned about adding too much to rust-project.json, ours are around 20MB today and we'd want to make sure we measure any changes to make sure they don't drastically impact startup time.

For the purposes of the linked PR, I'd like to move the commands responsible for runnables and flycheck out of each individual crate and onto the same level as sysroot and friends, which should limit the impact this has on the size of your rust-project.json and make it easier to lint and report against typos in those command. One followup question: do you generate a single rust-project.json for all Rust code in Fuschia's monorepo? For context, with Buck, we generate a new rust-project.json on the fly with #15863 each time a new, unindexed Rust file is opened.

@P1n3appl3
Copy link

Yeah we currently use a single rust-project.json containing thousands of crates (pretty much "every reachable rust target in the build"). With lazy indexing enabled it's not too bad in terms of startup time, flychecks are the main thing that drags.

Y'all's solution with separate workspaces/on-the-fly rust-project.json generation is quite interesting and I can see how that'd be a superior experience in the buck2/bazel world, but it's probably out of reach for us without major additions to GN which are unlikely to happen.

Anyways, as long as it still includes the ability for the flycheck command to get passed the modified file I think we're happy with the approach given that #12882 is our main concern. FWIW I feel pretty much the same way about flycheck as your comment: sure it could be handled outside of the r-a, but dang is it convenient that r-a knows how to read rustc/clippy json diagnostics and give you code actions for suggested replacements and such. I think that convenience is enough to warrant a bit of extra complexity to handle it in a build-system-agnostic way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust-project rust-project.json related issues C-feature Category: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants