Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a regex filter for the test runner & assorted fixes #13948

Merged
merged 4 commits into from
May 15, 2014

Conversation

huonw
Copy link
Member

@huonw huonw commented May 5, 2014

This allows writing a regex to filter tests more precisely, rather than having to list long paths e.g.

$ ./stdtest-x86_64-unknown-linux-gnu 'vec.*clone'

running 2 tests
test vec::tests::test_clone ... ok
test vec::tests::test_clone_from ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured

The regex change is fully backwards compatible, since test names are Rust
identifiers + :, and hence not special regex characters.

(See commits for details.)

@alexcrichton
Copy link
Member

r=me, but looks like there's a failure on travis

bors added a commit that referenced this pull request May 6, 2014
The regex change is fully backwards compatible, since test names are Rust
identifiers + `:`, and hence not special regex characters.

(See commits for details.)
@@ -170,7 +180,7 @@ pub fn log_config(config: &config) {
logv(c, format!("stage_id: {}", config.stage_id));
logv(c, format!("mode: {}", mode_str(config.mode)));
logv(c, format!("run_ignored: {}", config.run_ignored));
logv(c, format!("filter: {}", opt_str(&config.filter)));
logv(c, format!("filter: {}", if config.filter.is_some() { "(regex)" } else { "(none)" }));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a Show implementation for Regex, is there a reason you opted out of it here? (just passing by)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't even realise that existed.

@alexcrichton
Copy link
Member

This build failure is a legitimate problem, and it is a bit of an interesting one. The failure occurs when the compiler opens up the regex_macros library. This library has a dependence on the regex crate, and it is supposed to be found in the target lib directory. The compiler adds the target lib directory to LD_LIBRARY_PATH so it can be found. Unfortunately, the compiler adds this target directory to the end of the search path.

You've updated libtest to have a dependency on libregex, which means that libregex-foo.so is now available in the host directory (it was missing there before), so now the call to dlopen is opening up libregex-foo.so in the host directory, not the target directory.

I believe if you change the order of addition here to prepend to LD_LIBRARY_PATH instead of append, it should work out.

@alexcrichton
Copy link
Member

As in, this:

diff --git a/src/libstd/unstable/dynamic_lib.rs b/src/libstd/unstable/dynamic_lib.rs
index 68f0aaa..865b9cb 100644
--- a/src/libstd/unstable/dynamic_lib.rs
+++ b/src/libstd/unstable/dynamic_lib.rs
@@ -73,7 +73,7 @@ impl DynamicLibrary {
             ("LD_LIBRARY_PATH", ':' as u8)
         };
         let newenv = os::getenv_as_bytes(envvar).unwrap_or(box []);
-        let newenv = newenv + &[sep] + path.as_vec();
+        let newenv = path.as_vec() + &[sep] + newenv;
         os::setenv(envvar, str::from_utf8(newenv).unwrap());
     }

@huonw
Copy link
Member Author

huonw commented May 8, 2014

I've been a bit busy, will get to this tomorrow.

@huonw
Copy link
Member Author

huonw commented May 8, 2014

Actually, was easy enough to do now. r? @alexcrichton

@alexcrichton
Copy link
Member

I'm confused about this failure, I'll try to investigate soon.

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

I have been looking at the failure. I can replicate on OS X. Here is what I have established:

  • Running the same compiletest command reproduces the failure for me.
  • Running the rustc command that compiletest claims it is running does not fail (or rather, it hits the expected syntax error, rather than hitting the dynamic link failure that compiletest hits).
  • Inspecting the source code for compiletest, I now know the reason for the above discrepancy: compiletest's log output is lying to us. In particular the string constructed here in make_cmdline is a source of it lying to us, because it causes the output to include nothing about the _libpath being used, even though here in program_output we can see that it does actually pass the lib_path to procsrv::run, and thus it will use the lib_path that make_cmdline failed to report back to us.
  • Modifying program_output to include a bit more log output with the lib_path, we can see it is using this lib_path for the run: x86_64-apple-darwin/stage2/lib. When I install that as the DYLD_LIBRARY_PATH, then the rustc invocation fails. If I install instead x86_64-apple-darwin/stage2/lib/rustlib/x86_64-apple-darwin/lib, then the rustc invocation behaves the way we want.

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

I assume that regex_macros needs to dynamically load the regex crate in order to determine what symbols to link against in its output code; is that actually true, or should it be able to determine the appropriate mangled symbol without loading (or at least finding the path to) the regex crate?

The reason I ask: we have wanted to support keeping the compile-time crates in a path distinct from the run-time crates, if desired. But that implies that the above kinds of symbol lookups should not attempt to use the DYLD_LIBRARY_PATH at the time that rustc runs.

Perhaps this is the kind of problem that @alexcrichton has been concerned about for all this time, and I was missing the big picture issue...

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

Or really, if you think about a model like that used in Racket (see "You Want It When" paper and the sequel), it seems to me that what we probably need is for libregex_macros/lib.rs to say something like:

// "phase(generated_code)" is meant to mean something
// like Racket's `(requre (for-template ...))`
#[phase(generated_code)]
extern crate regex;

i.e. that model needs some way for the macro-defining compile-time crate to indicate that it wants symbols from the run-time crate, but it won't be executing any code from that run-time crate.

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

(i haven't looked very carefully at why this PR exposes the bug described above. You would think we would be seeing it even without this PR...)

@alexcrichton
Copy link
Member

Interesting!

I believe that a major culprit is this code: https://github.com/mozilla/rust/blob/master/src/librustc/driver/driver.rs#L234-L241. The comment I wrote asserts that rpaths on unix fix everything, but this is not a true statement. The contents of LD_LIBRARY_PATH are prioritized over rpaths.

I believe that compiletest is passing the correct LD_LIBRARY_PATH to invoking rustc, because rustc genuinely wants to use the libraries in that directory. The difference comes when you want to dynamically open up a library. The compiler I believe just needs to prepend the compiler lookup paths to LD_LIBRARY_PATH so you'll find the right regex macros crate.

I'm not sure if requiring a set of compile-time crates and a set of non-compile-time crates is what we want, but I'm still confused by this problem. It appears that the stage2 host regex crate is different than the stage2 target regex crate, but we're semi-guaranteeing convergence by stage2. For example, the stage2 regex target crate depends on the stage2 std target crate, but we're dlopening the stage2 regex target crate into a process which only has the stage2 host std crate (and yet somehow it works)?

I feel like we've been toeing the line of whether this arbitrary code at compile time is possible or not. The possibility of it is looking bleaker as time goes on...

@alexcrichton
Copy link
Member

And I confuse myself even further. That code in rustc is adding the host directory to the search paths, not the target directory. So it will not have the fix that I was talking about...

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

Another thing I just noticed that I should point out:

Assuming that our makefiles and compiletest are feeding in the correct paths (which I think they are; I agree with alex that compiletest seems like it is passing the correct LD_LIBRARY_PATH to rustc) ...

Take a look at this command line transcript (which I believe corresponds to what compiletest is asking rustc to do):

% DYLD_LIBRARY_PATH=x86_64-apple-darwin/stage2/lib x86_64-apple-darwin/stage2/bin/rustc /Users/fklock/Dev/Mozilla/rust-helphuon/src/test/compile-fail-fulldeps/syntax-extension-regex-invalid.rs -L x86_64-apple-darwin/test/compile-fail-fulldeps --target=x86_64-apple-darwin -L x86_64-apple-darwin/test/compile-fail-fulldeps/syntax-extension-regex-invalid.stage2-x86_64-apple-darwin.libaux -C prefer-dynamic -o x86_64-apple-darwin/test/compile-fail-fulldeps/syntax-extension-regex-invalid.stage2-x86_64-apple-darwin --cfg rtopt --cfg debug -O -L x86_64-apple-darwin/rt
dyld: lazy symbol binding failed: Symbol not found: __ZN2re5Regex3new20h605f1a58f435e390BZg9v0.11.preE
  Referenced from: /Users/fklock/Dev/Mozilla/rust-helphuon/objdir-dbgopt-check/x86_64-apple-darwin/stage2/lib/rustlib/x86_64-apple-darwin/lib/libregex_macros-10a265ff-0.11-pre.dylib
  Expected in: x86_64-apple-darwin/stage2/lib/libregex-1cae1bee-0.11-pre.dylib

dyld: Symbol not found: __ZN2re5Regex3new20h605f1a58f435e390BZg9v0.11.preE
  Referenced from: /Users/fklock/Dev/Mozilla/rust-helphuon/objdir-dbgopt-check/x86_64-apple-darwin/stage2/lib/rustlib/x86_64-apple-darwin/lib/libregex_macros-10a265ff-0.11-pre.dylib
  Expected in: x86_64-apple-darwin/stage2/lib/libregex-1cae1bee-0.11-pre.dylib

Trace/BPT trap: 5

In particular, why is that talking about lib/rustlib/x86_64-apple-darwin/lib/libregex_macros, i.e. the target's regex_macros crate ? If everything were internally consistent, shouldn't we only be pulling in the host's regex_macros crate?


update: perhaps this is exactly why the fix @alexcrichton mentioned two comments up might actually work (even though alex denied that could work one comment up...)

@pnkfelix
Copy link
Member

pnkfelix commented May 9, 2014

@alexcrichton and yes, I was also surprised that we did not have convergence in terms of the state of regex on stage2-host and stage2-target.

But I also think a scheme like the one employed by Racket should actually not require convergence; rather, it requires the macro author to understand the distinction between their run-time symbols (which end up as compile-time for the end-client program using the macro) and their "template"-time symbols (what I also called "phase(codegen)" above), which end up as run-time for the end-client program using the macro. The macro author may in fact need to reference two distinct versions of the crate, (and give them different names when pulling them in), since they may want to use the crate themselves in the implementation of the procedures driving the macro, while also emitting code that refers to symbols from the crate). But we should be able to handle that in principle.

@alexcrichton
Copy link
Member

In particular, why is that talking about lib/rustlib/x86_64-apple-darwin/lib/libregex_macros, i.e. the target's regex_macros crate

When the compiler dlopen()s a crate, it uses the target libraries, not the host libraries (reason below)

If everything were internally consistent, shouldn't we only be pulling in the host's regex_macros crate?

The compiler can't currently produce an artifact linked against its own artifacts because that would involve linking to code that the compiler didn't generate. It is dlopen()'ing the code though, which is a bit sketchy...

And I confuse myself even further. That code in rustc is adding the host directory to the search paths, not the target directory. So it will not have the fix that I was talking about...

I rescind this comment, this makes sense.


New development: http://www.cocoabuilder.com/archive/cocoa/133694-dyld-library-path-etc.html

You cannot do this as the dynamic linker reads these variables at the
time the application launches, and not each time a new framework is
loaded.

Our hokey method of adding to the search path at runtime is apparently just that, hokey. It worked for windows, but I don't think that it works for anyone else.


Ok, so here's what I think the situation is. As long as we can guarantee convergence by stage, everything should be working today. I believe the regex crate is the only crate that is not converging, due to one use case of cfg(not(stage1)). This is proven via:

$ RUST_LOG=rustc::back::link=info ./x86_64-apple-darwin/stage1/bin/rustc ./src/libregex/lib.rs --cfg stage1         
INFO:rustc::back::link: LinkMeta { crateid: regex#0.11-pre, crate_hash: b823b9eafa200ad0 }
$ RUST_LOG=rustc::back::link=info ./x86_64-apple-darwin/stage2/bin/rustc ./src/libregex/lib.rs --cfg stage2                   
INFO:rustc::back::link: LinkMeta { crateid: regex#0.11-pre, crate_hash: d76de1556444a553 }

Ok, so let's think about a solution. First, I/felix have been talking a lot about search paths and library paths. I don't think those are relevant in this instance, as we can only have a sane world if we rely on convergence. The real issue here is that the regex crate is different at stage1-generation time and stage2-generation time (all libraries should be constant at these two times).

I believe the real solution is that we need libregex to converge. An easy solution may be libregex_tests, I would like to have better support in the compiler, however, to guarantee this convergence. The AST truly is the same, but I think it takes a bit of a different path to get there.

@alexcrichton
Copy link
Member

I believe I have found why regex is not converging. This has to do with how when we hash an identifier, we hash the name of the identifier, not the contents of the identifier. This means that the order in which identifiers are declared turns out to be important. For example:

fn main() {                  
    let foo = 1;             
    let bar = 3;             
    let baz = 4;             
    let _a = (foo, bar, baz);
}
export RUST_LOG=rustc::back::link=info
rustc foo.rs && rustc foo.rs --cfg foo && rustc foo.rs --cfg bar && rustc foo.rs --cfg baz
INFO:rustc::back::link: LinkMeta { crateid: foo#0.0, crate_hash: deb14a517b1d850c }
INFO:rustc::back::link: LinkMeta { crateid: foo#0.0, crate_hash: e317e2ddb2708a7e }
INFO:rustc::back::link: LinkMeta { crateid: foo#0.0, crate_hash: 1ca13c598ed2b157 }
INFO:rustc::back::link: LinkMeta { crateid: foo#0.0, crate_hash: 3495502fed3558fd }

What's happening here is that these symbols are getting interned early on in the compilation process due to being on the command line, affecting their values later on.

So the problem is not that the regex crate has cfg(stage1), the problem is that stage1 appears as an identifier anywhere in the crate.


tl;dr - I have no idea why, but this patch appears to generate the same SVH for stage1/stage2

diff --git a/src/libregex/test/mod.rs b/src/libregex/test/mod.rs
index a4c3a83..b315396 100644
--- a/src/libregex/test/mod.rs
+++ b/src/libregex/test/mod.rs
@@ -9,6 +9,7 @@
 // except according to those terms.

 #[cfg(not(stage1))]
+#[cfg(stage2)]
 #[phase(syntax)]
 extern crate regex_macros;

@pnkfelix
Copy link
Member

@alexcrichton I agree that convergence is a sufficient condition for correctness of our overall macro system (that is, if we could get this crate to converge, then a lot of the problems we are seeing on this PR would go away).

I am not yet convinced that convergence is a necessary condition for correctness (which I think I stated earlier when I was referring to the various Racket macro papers). But I am also willing to table this debate for now in the interest of getting today's code working. :)

@pnkfelix
Copy link
Member

I hypothesize that rebasing this PR atop PR #14145 will remove the issue that @alexcrichton dissected and diagnosed in his earlier comment

@huonw
Copy link
Member Author

huonw commented May 13, 2014

I'll rebase and reopen this when #14145 lands.

@huonw huonw closed this May 13, 2014
@pnkfelix
Copy link
Member

Responding to myself: the place where convergence is a necessary condition for correctness is when you have macros that generate procedural macros that are then invoked in that same crate.

The Racket papers I linked above provide examples where Racket wants to be able to generate procedural macros and invoke those macros within the same compilation unit (effectively), and I am pretty sure that some of those examples cannot be readily re-encoded in a system where there is a forced separation between the compilation unit with the macro-definition and the unit with the macro-use.

(But just because Racket needs that feature does not mean that Rust needs it...)

This is far cheaper than the `.to_str` technique that was used previously.
@huonw
Copy link
Member Author

huonw commented May 15, 2014

Rebased. r?

I kicked off some try builds (I'll probably be asleep by the time they (hopefully) pass):

I'm not sure if the dynamic_lib patch is needed any more?

@huonw huonw reopened this May 15, 2014
huonw added 3 commits May 15, 2014 23:04
This is fully backwards compatible, since test names are Rust
identifiers + `:`, and hence not special regex characters.

Fixes rust-lang#2866.
Previously the longer hand-written usage string was never being printed:
theoretically it was trying to detect when precisely `--help` was
passed (but not `-h`), but the getopts framework was considering a check
for the presence of `-h` to be a check for that of `--help` too,
i.e. the code was always going through the `-h` path.

This changes it to print the extended usage for both `-h` and `--help`,
meaning that it does actually appear correctly.
The compiler needs to be opening e.g. libregex in the correct directory,
which requires getting these in the right order.
bors added a commit that referenced this pull request May 15, 2014
This allows writing a regex to filter tests more precisely, rather than having to list long paths e.g.

```
$ ./stdtest-x86_64-unknown-linux-gnu 'vec.*clone'

running 2 tests
test vec::tests::test_clone ... ok
test vec::tests::test_clone_from ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured
```

The regex change is fully backwards compatible, since test names are Rust
identifiers + `:`, and hence not special regex characters.

(See commits for details.)
@bors bors closed this May 15, 2014
@bors bors merged commit 18c13de into rust-lang:master May 15, 2014
@huonw huonw deleted the test-regex-filter branch May 15, 2014 20:50
@huonw
Copy link
Member Author

huonw commented May 15, 2014

Thanks for fixing the SVH, @pnkfelix!

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 13, 2023
Make inlay hint location links work for more types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants