Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to intercept binaries invoked by Cargo #3866

Closed
wants to merge 5 commits into from

Conversation

matklad
Copy link
Member

@matklad matklad commented Mar 24, 2017

Fix #3670.

This adds an environmental variable CARGO_WRAP, which prevents cargo from running binaries and allows IDEs and such to take over the wheel.

Question: how this should interract with --message-format=json? What should Cargo do, if it sees CARGO_WRAP, but --message_format is human?

@rust-highfive
Copy link

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@matklad
Copy link
Member Author

matklad commented Mar 24, 2017

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned brson Mar 24, 2017
impl<'a> RunProfile<'a> {
pub fn new(process: &ProcessBuilder) -> RunProfile {
RunProfile {
program: process.get_program().to_string_lossy(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: assert! that this is an absolute path.

@vojtechkral
Copy link
Contributor

Thanks for working on this!

I'm not sure I understand the several_tests output, are there several JSON objects, but not delimited by a comma?

@matklad
Copy link
Member Author

matklad commented Mar 26, 2017

I'm not sure I understand the several_tests output, are there several JSON objects, but not delimited by a comma?

Yep! During build, Cargo writes JSON in a JSON object per line format, so that it does not need to wait until the end of the build to produce a single array of messages.

To test this, we don't directly compare stderr as string, but parse JSON and compare objects, which gives nicer error messages and avoids problems with field reordering. Specifying expected messages in JSON per line format is inconvenient, so a blank line is used to separate different messages. The relevant code is here:

self.expect_json = Some(expected.split("\n\n").map(|obj| {

In this particular case we indeed output a separate JSON message for every binary to be run.

@vojtechkral
Copy link
Contributor

vojtechkral commented Mar 27, 2017

@matklad Okay, so as a user of CARGO_WRAP all I need to do is parse each line as a separate JSON object and I can rely on that they won't span multiple lines?

@matklad
Copy link
Member Author

matklad commented Mar 27, 2017

all I need to do is parse each line as a separate JSON object and I can rely on that they won't span multiple lines?

Yes! In theory, you can also relay on the fact that each line of stdout is a JSON object (we keep non-json output on stderr), but in practice I'd add a if line.starts_with('{') gurard just in case.

@vojtechkral
Copy link
Contributor

vojtechkral commented Mar 27, 2017

Regarding

What should Cargo do, if it sees CARGO_WRAP, but --message_format is human?

I think cargo might even treat that as an error. I can't think of a situation where setting both CARGO_WRAP and --message_format=human makes sense / is valid.

@matklad
Copy link
Member Author

matklad commented Mar 27, 2017

I think cargo might even treat that as an error. I can't think of a situation where setting both CARGO_WRAP and --message_format=human makes sense / is valid

Yep, that'd be a safe start. Though there are situations where you might want to both show usual output to the user and do something with the binaries: #3855. But that's a general problem that human and json should not be mutually exclusive.

If `CARGO_WRAP` environment variable is specified, cargo
will just print what it should have executed otherwise
during `cargo test` / `cargo run`.
@alexcrichton
Copy link
Member

Thanks for the PR @matklad, implementation looks great to me.

My only question would be about the interface here. A CARGO_WRAP env var seems slightly odd to me in terms of both naming and method of transmission to Cargo. I'd probably naively expect this to be an argument on the command line perhaps? (next to --message-format=json).

We may also have an opportunity to hit two birds with one stone here by allowing human message to also provider a wrapper, such as with strace or gdb (or something like that). I don't know of a great way to specify that, though.

In general though I'd prefer to call this flag perhaps something along the lines of --do-not-exec-child-processes-instead-print-them or something like that. There's not actually any "wrapping" going on here b/c the child isn't being executed.

@matklad
Copy link
Member Author

matklad commented Mar 27, 2017

A CARGO_WRAP env var seems slightly odd to me in terms of both naming and method of transmission to Cargo.

I've must have misinterpreted #3670 (comment) then :)

The benefit of environmental variable is that you don't need to mess up with command line, which is provided by the user. That said, you still need to modify the command line for --message-format=json. For the IntelliJ Rust (and I suspect for other wrapping tools), it would be the most convenient if both --message-format and --wrap could be specified as environmental variables.

Another advantage of env var is that you don't clutter cargo run --help.

I am ok with changing this to command line argument, but I like the env var slightly better.

--do-not-exec-child-processes-instead-print-them or something like that.

Does CARGO_INTERCEPT_CHILD_PROCESS sound better?

@alexcrichton
Copy link
Member

Oh interesting! Both that I can't make up my mind and I wasn't aware of how IntelliJ passed --message-format=json. If it's easier to do all this through env vars then that seems plausible.

Does CARGO_INTERCEPT_CHILD_PROCESS sound better?

This isn't really doing much intercepting though? It's sort of like "don't run final child processes, only the intermediate ones that the compiler does". This also seems like it's definitely toeing the line on #3815 so we may want to be sure to consider that as well.

@matklad
Copy link
Member Author

matklad commented Mar 28, 2017

This also seems like it's definitely toeing the line on #3815 so we may want to be sure to consider that as well.

That's interesting! I expect we can reuse RunProfile struct directly, but just printing it instead of invoking rustc as we go won't be enough, because you need information about dependencies between compiler runs as well. So for the build plan I think a big json like in cargo metadata is a better solution. cargo metadata --build-plan perhaps?

Oh interesting! Both that I can't make up my mind and I wasn't aware of how IntelliJ passed --message-format=json. If it's easier to do all this through env vars then that seems plausible.

I am also not sure what is the best solution here. "Avoid messing up with user's command line" is a weak argument, because it should not be that hard, and because I expect we'll need to understand Cargo's command line eventually anyway. I think not cluttering cargo run --help is a more important issue.

And I still can't think of the name for the arg/env I really like :) One more option is CARGO_NO_RUN, similar to cargo test --no-run, but again, I kinda don't like test --no-run in the first place.

Am I correct that there are only two unresolved questions:

  • Should this be an env var or a command line flag?
  • What should be the name of the thing?

Oh, one more idea about name: CARGO_PRINT_RUN.

@vojtechkral
Copy link
Contributor

Oh, one more idea about name: CARGO_PRINT_RUN.

I, for one, like that one...

Maybe document it in src/doc/environment-variables.md as well?

@matklad
Copy link
Member Author

matklad commented Mar 28, 2017

Maybe document it in src/doc/environment-variables.md as well?

Sure!

I've think I've finished everything implementation wise, only option name/passing 🚲🏠 is left =)

@vojtechkral
Copy link
Contributor

Hmm. I think it would be easier if CARGO_PRINT_RUN implied --message-format=json rather than required, at least for the code that will use CARGO_PRINT_RUN. Would that be a problem implementation-wise?

@matklad
Copy link
Member Author

matklad commented Mar 29, 2017

Hmm. I think it would be easier if CARGO_PRINT_RUN implied --message-format=json rather than required, at least for the code that will use CARGO_PRINT_RUN.

It should be possible to implement, but I suspect that for PRINT_RUN you actually want Cargo to produce usual human-readable output first, and then give the JSON as well. Hypothetical cargo gdb should show usual errors if something's not compiling. But this again depends on #3855. Requiring --message-format=json seems like the most conservative option.

@vojtechkral
Copy link
Contributor

I'm a bit confused (I probably don't understand the other bug) - what's the reason we can't have cargo output the human readable output to stderr and json (if any) to stdout?

@matklad
Copy link
Member Author

matklad commented Mar 29, 2017

what's the reason we can't have cargo output the human readable output to stderr and json (if any) to stdout?

Compiler currently produces either only human-readable messages, or JSON. Cargo already prints both JSON and human oriented messages with --message-format=json, but it passes --error-format=json down to the compiler, which turns of human readable compiler errors.

So we either need a flag --message-format=json-for-cargo-human-for-rustc, or we need to teach compiler to produce both JSON and text. The latter option is complicated a bit by the fact that unlike Cargo, rustc uses the same stream for both text and JSON.

@vojtechkral
Copy link
Contributor

Uhh, okay, in that case how about doing it the opposite way - have CARGO_PRINT_RUN imply/require --message-format=human? That should be good enough for wrapping executables and whoever needs more detailed info about the build process will have to wait for #3815

@vojtechkral
Copy link
Contributor

Have I gridlocked the discussion with my complaints? Those were just my ¢2, feel free to let me know if I'm bikeshedding too much :-)

@bors
Copy link
Contributor

bors commented Apr 3, 2017

☔ The latest upstream changes (presumably #3893) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

Ok so coming back to this, I'm not 100% sure how to proceed. I feel like we have a few related proposals in flight, but I don't want to block progress. In that sense I'm trying to come up with the most conservative solution with the fewest extensions possible.

To that end, I wonder if it'd be helpful to dig in to why this isn't possible today? Is it because the cwd/env aren't known? Is it because binaries cannot be found? I'd be tempted to say that this is roughtly equivalent to translating cargo test to cargo build --tests (or similar), scraping logs, and then running binaries manually.

@matklad
Copy link
Member Author

matklad commented Apr 5, 2017

To that end, I wonder if it'd be helpful to dig in to why this isn't possible today?

It's 95% percent possible today if you are willing to parse cargo's command line. The worst case will look like this: cargo test foo --bin bar -- --nocapture which you need to translate to cargo test --norun + test_binary --test test --nocapture. The remaining 5% percent are about LD_LIBRARY_PATH (and just PATH on windows) needed for shared libraries sometimes. intellij-rust/intellij-rust#1100 is I believe an LD_LIBRARY_PATH issue.

So, exactly two problems to solve:

  • Avoid parsing Cargo's command line (this can be worked around)

  • Getting information about environment in which to run the process (no workaround currently)

On the one hand, I'd be happy to punt on this for now, because 95% workaround exists. On the other hand, I don't want to go the "reimplement tiny bits of Cargo in IntelliJ" road because I fear that even small pieces like cargo test -> cargo test --norun may cause problems like "works on command line, but not in IntelliJ" down the line.

@alexcrichton are you not sure about the end result (JSON messages and not running actual processes) or about the interface (environmental variable vs command line flag)? I am rather confident that the end result is right, useful and is not a backwards compatibility hazard. I don't particularly like the interface part myself!

@vojtechkral what do you think about the problem from the command-line wrapper perspective?

@vojtechkral
Copy link
Contributor

vojtechkral commented Apr 7, 2017

Seems to me like we're streching cargo's interface to limits or beyond here.

I've given the cli wrappers a bit of rethinking and I think I'll simplify them to not use/mirror cargo's cli iface. Ie. the interface would be simply cargo tool [tool args...] <binary|test name> [binary args...] where tool is something like gdb, strace, mdb or such.

I still need to get the path to the binary as well as the environment from cargo somehow. I think that trying to bolt this on top of the regular cli interface leads to problems we're seeing. I can think of two ways how to do this without abusing the cli interface:

  • A library. Either add a publicly-usable interface to libcargo or create a seperate "cargo metadata" library that external tools could use. I'm not sure whether it's feasible to use Rust library from IntelliJ, though, so the other option might be more practical.

  • A cargo meta subcommand, which could output all sorts of metada about the build process and/or current targets. The interface could be something like: cargo meta --type-of-query query ... For example, cargo meta --bin foobar could output a json telling where the foobar binary is located and what LD_LIBRARY_PATH should be, etc., cargo meta --test foobar could do the same for a test binary. cargo meta would also be a natural place for adding Support producing a "build plan" without executing anything #3815 in the future, something like cargo meta --plan <build invocation command line>.

Does this help?

edit: typos, sorry!

@alexcrichton
Copy link
Member

@matklad I wonder if I I'm just too resistant to change right now by accident. In some sense there's no end of features we can add to Cargo and none of them are relatively difficult to support in a backwards-compatible fashion. The drawback is that eventually Cargo could be a giant pile of random bits and pieces, most of which aren't recommended any more but still in use by various tools throughout the ecosystem.

In that sense I don't currently think I either agree or disagree with the end goal. I don't see any reason not to generate json message and such, but fitting it into the broader vision of where Cargo will eventually be may be difficult.

I don't really want to block this indefinitely though, so I'm tempted to just "oh well" and land the PR as-is. It's clearly solving a use case for IntelliJ today and it's not really that hard to support in a backwards compatible fashion.

@vojtechkral it sounds like though from IntelliJ's perspective if a command line is coming in and doesn't want to be looked at then there's not many options here. Using Cargo as a library isn't easy b/c you'd have to parse the CLI, and otherwise using a command like cargo meta would also involve parsing and learning about intent. @matklad may have more to say though.

@vojtechkral
Copy link
Contributor

Using Cargo as a library isn't easy b/c you'd have to parse the CLI

What I meant was that the CLI parsing code Cargo uses would be part of that library (and so could be used by others too).

@alexcrichton
Copy link
Member

@matklad I wonder, RUSTC_WRAPPER in #3887 is, or isn't sufficient here?

@alexcrichton
Copy link
Member

Oh probably not, doesn't handle user pieces as well.

Alas!

@matklad
Copy link
Member Author

matklad commented Apr 21, 2017

In some sense there's no end of features we can add to Cargo and none of them are relatively difficult to support in a backwards-compatible fashion. The drawback is that eventually Cargo could be a giant pile of random bits and pieces, most of which aren't recommended any more but still in use by various tools throughout the ecosystem.

@alexcrichton , this is my concern as well! (I am already not happy with the project model which has grown too many pieces: workspaces, packages, targets like --lib, --lib or an --example which can be either a --lib or a --bin :( ).

So now I think the best path would be to postpone this for now, parse command line in IntelliJ instead, and revisit this once the debugger support in IntelliJ matures.

Ie. the interface would be simply cargo tool [tool args...] <binary|test name> [binary args...] where tool is something like gdb, strace, mdb or such.

@vojtechkral keep in mind that Cargo already has a rather elaborate interface for selecting the binary! There's cargo run --bin foo, cargo run --example foo (you can have an example and a binary with the same name) and cargo run --package some_workspace_pkg --bin foo. The simple interface would work in 90% cases of course.

Deterministic filenames exposed via cargo metadata seem to be a nice solution, but the name of the binary depends on the profile, and we might want to grow custom profiles some day (See also "Stable file names" section of #3670 (comment) :) ).

cargo meta would also be a natural place for adding #3815 in the future, something like cargo meta --plan.

I wounder if Cargo should really be a pipeline of independent binaries producing and consuming JSON 😜

@vojtechkral
Copy link
Contributor

vojtechkral commented Apr 21, 2017

@vojtechkral keep in mind that Cargo already has a rather elaborate interface for selecting the binary!

Yeah, that's true. The problem is that I won't be able to figure out what command is used to build a binary. Which makes extensions like cargo gdb add little value over just using gdb and locating the binary by hand.

I wounder if Cargo should really be a pipeline of independent binaries producing and consuming JSON 😜

No, probably not.

Seems to me it's a good call to postpone this at this point. I'd say that cargo has already hit some of the problems @alexcrichton mentioned - the interfacing seems to have gotten quite complex :)

By and large, cargo reminds me of git - it also has a complex CLI interface and cargo's Cargo.{toml,lock} files and especially the targetdir are reminiscent of git's.gitdir. Git is ahead in comparison in that it provides thelibgit2` library to do pretty much anything in programmatic manner.

By consequence, this boils down to designing a comprehensice interface for cargo, both CLI and programmatic, which is of course a huge amount of work.

@alexcrichton
Copy link
Member

Ok postponing sounds good to me, but @matklad please keep me posted on developments. If the need for this arises again I think we'd be ok to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants