Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close_output test is randomly failing #8564

Closed
ehuss opened this issue Jul 30, 2020 · 10 comments · Fixed by #8587
Closed

close_output test is randomly failing #8564

ehuss opened this issue Jul 30, 2020 · 10 comments · Fixed by #8587
Labels
A-testing-cargo-itself Area: cargo's tests C-bug Category: bug

Comments

@ehuss
Copy link
Contributor

ehuss commented Jul 30, 2020

TLDR: Should we run some flaky tests single-threaded? (Nope)

The build::close_output test is randomly failing on CI. There were some fixes applied in #8286 in May 26, but there appears to be more recent failures:

rust-lang/rust#74312 (comment)
rust-lang/rust#74408 (comment)
rust-lang/rust#74908 (comment)
rust-lang/rust#74923 (https://github.com/rust-lang-ci/rust/runs/924743383)

The failure is:

---- build::close_output stdout ----
thread 'build::close_output' panicked at 'assertion failed: !status.success()', src/tools/cargo/tests/testsuite/build.rs:5016:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I am uncertain how this is possible, so maybe someone could double check that what I wrote makes sense. The test covers what happens when stdout or stderr is closed in the middle of the build. It uses a proc-macro as a sync point so that the test can know when compilation has started, and to emit data to stdout or stderr during the build. It should follow this sequence:

  1. Starts a TCP server.
  2. Starts the build.
  3. The proc-macro starts building.
  4. The proc-macro connects to the TCP server, and waits for the test to tell it it is OK to continue.
  5. Test receives connection from proc-macro
  6. Test closes stdout.
  7. Test tells proc-macro to continue.
  8. Proc-macro starts spewing stuff to stdout to cargo, which through the internal job queue ends up attempting to write to stdout. Since stdout was closed in step 6, this should fail.
  9. Cargo should exit with an error after rustc is done.

For some reason, at step 8, it successfully writes to stdout, and step 9 returns success.

I've been doing a few tests, and it gets worse based on the number of concurrent tests running. When run single threaded, I cannot get it to fail (even with the system under heavy load).

I'm feeling this is somewhat related to #7858. Is there still a race condition, even with atomic O_CLOEXEC? That is, AIUI, the file descriptors are still inherited across fork, and only closed when exec is called. If so, then there is a small window where the file descriptors have extra duplicates which prevent them from fully closing immediately.

I'm thinking a simple solution would be to isolate these tests into a separate test executable which runs with --test-threads=1 (or maybe a simple no-harness test?). This should prevent concurrent tests from interfering with one another. The downside is that this makes it more cumbersome to run all of the test suite. (Testing shows this probably won't fix this test.)

@ehuss ehuss added C-bug Category: bug A-testing-cargo-itself Area: cargo's tests labels Jul 30, 2020
@ehuss
Copy link
Contributor Author

ehuss commented Jul 30, 2020

Ugh, I just encountered a failure with it running in isolation without any threads...

@alexcrichton
Copy link
Member

I can't seem to reproduce this locally, at least not when running all tests in a loop that contain build::. Can you try adding some logging to confirm your suspicion of the series of events?

Otherwise my only real guess is that the pipe fills up before returning an error, but local tests show that doesn't happen so I don't think this is right. Especially if it happens without any threads that is indeed very odd...

One thing that may help is to panic!() at the end of the proc macro to force compilation to fail, and then testing the output may help diagnose?

@ehuss
Copy link
Contributor Author

ehuss commented Jul 30, 2020

I have instrumented it by writing to a file to verify what is happening. It seems to be working in the sequence I expect, and Cargo is sometimes successfully writing to stdout, even though I think it should be closed. Unless I have some misunderstanding about how pipes work (I've been assuming that after a successful close, the other side should be closed immediately assuming there aren't any duplicates of the fd). I keep thinking the most likely cause is a duplicate fd somewhere, but I can't find it.

To repro, here's a change that can make it occur much faster:

diff --git a/tests/testsuite/build.rs b/tests/testsuite/build.rs
index 7bcd6215a..f78354db4 100644
--- a/tests/testsuite/build.rs
+++ b/tests/testsuite/build.rs
@@ -4971,7 +4971,7 @@ fn close_output() {
                     let mut buf = [0];
                     drop(socket.read_exact(&mut buf));
                     let use_stderr = std::env::var("__CARGO_REPRO_STDERR").is_ok();
-                    for i in 0..10000 {
+                    for i in 0..1 {
                         if use_stderr {
                             eprintln!("{}", i);
                         } else {

The actual count there shouldn't matter. After stdout is closed, any write should fail. And because the proc-macro and test are synchronized, it is guaranteed that the write comes after the close.

For doing single-threaded testing, I build the test and run this:

while RUSTUP_HOME=/home/eric/.rustup target/debug/deps/testsuite-1be76affecebe07f \
    build::close_output --test-threads=1; do :; done

For me, this can fail anywhere from a few seconds to a few minutes. If you want a near-guaranteed failure, run several copies in parallel. I have a macro in my editor which adds something like the following:

diff --git a/tests/testsuite/build.rs b/tests/testsuite/build.rs
index 7bcd6215a..d44093d2a 100644
--- a/tests/testsuite/build.rs
+++ b/tests/testsuite/build.rs
@@ -4931,6 +4931,25 @@ fn user_specific_cfgs_are_filtered_out() {
     p.process(&p.bin("foo")).run();
 }

+#[test] fn close_output0() { close_output(); }
+#[test] fn close_output1() { close_output(); }
+#[test] fn close_output2() { close_output(); }
+#[test] fn close_output3() { close_output(); }
+#[test] fn close_output4() { close_output(); }
+#[test] fn close_output5() { close_output(); }
+#[test] fn close_output6() { close_output(); }
+#[test] fn close_output7() { close_output(); }
+#[test] fn close_output8() { close_output(); }
+#[test] fn close_output9() { close_output(); }
+#[test] fn close_output10() { close_output(); }
+#[test] fn close_output11() { close_output(); }
+#[test] fn close_output12() { close_output(); }
+#[test] fn close_output13() { close_output(); }
+#[test] fn close_output14() { close_output(); }
+#[test] fn close_output15() { close_output(); }
+#[test] fn close_output16() { close_output(); }
+
+

I usually use a few hundred copies, but 16 seems to be enough here. Run cargo test --test testsuite -- build::close_output.

Here's a copy of the instrumentation from a failed run:

TEST: spawning cargo
TEST: waiting for proc macro to start
CARGO: starting read2
CARGO: read some data is_out=false data=[] eof=true
CARGO: read some data is_out=true data=[] eof=true
CARGO: starting read2
PM: proc-macro started
CARGO: read some data is_out=false data=[] eof=false
CARGO: read some data is_out=true data=[104, 101, 108, 108, 111, 32, 115, 116, 100, 111, 117, 116, 33, 10] eof=false
CARGO: received stdout line "hello stdout!"
CARGO: read some data is_out=false data=[104, 101, 108, 108, 111, 32, 115, 116, 100, 101, 114, 114, 33, 10] eof=false
CARGO: writing to stdout
CARGO: read some data is_out=true data=[] eof=false
TEST: proc macro has started
TEST: closing stdout
TEST: closed stdout
TEST: telling proc-macro to continue
TEST: reading other output
PM: connected to test
PM: test told us to continue
CARGO: read some data is_out=false data=[] eof=false
CARGO: read some data is_out=true data=[48, 10] eof=false
CARGO: received stdout line "0"
PM: finished writing, exiting with success
CARGO: writing to stdout
CARGO: read some data is_out=false data=[] eof=true
CARGO: read some data is_out=true data=[] eof=true
CARGO: job queue loop finished, error=None
TEST: wait for cargo process to exit
TEST: success?!  stderr output:
    Compiling foo v0.1.0 (/home/eric/Proj/cargo/t8/t0/foo)
    hello stderr!
        Finished dev [unoptimized + debuginfo] target(s) in 0.57s\n"

Hopefully that makes sense. TEST is a log message from the test itself, PM is from the proc-macro, and CARGO is from cargo. "read some data" is from cargo's read2 from the pipe to rustc. Everything there is to be expected except for "CARGO: writing to stdout" should be followed with an error message that it failed to write.

Compared to successful runs (where Cargo returns the expected error), the logging looks mostly the same. Some messages are rearranged due to process/thread scheduling, but the important lines are in the expected order.

Thanks for helping to take a look! I keep feeling like I am missing something obvious, or have some fundamental misunderstanding about pipes, or how the Command struct spawns processes.

@ehuss
Copy link
Contributor Author

ehuss commented Jul 30, 2020

Oh, and I've actually been doing most of my testing on a fast linux machine, which seems to repro a little more easily than on my mac laptop (which also can fail, but not as easily). I just ran the test on my 12-thread mac and it failed 10 out of 400 runs, whereas on my 32-thread linux system, it failed 298 out of 400 runs.

(This is with the for i in 0..1 change mentioned above.)

@alexcrichton
Copy link
Member

alexcrichton commented Jul 30, 2020

Ok so I have some explanations, but not others.

If you change it to only print one item, e.g. change the loop to for i in 0..1, I can see the race condition. What's happening is that we can close stdout both before and after Cargo prints hello stdout. We're only synchronizing with the proc-macro, not Cargo's printing, so we know that rustc has printed hello stdout when we close stdout but we don't actually know if Cargo has printed that out. Due to flushing behavior in LineWriter if you write a line when stdout is closed then it will successfully eat one line, but all future writes/flushes will fail. The final flush when the main thread exits actually fails, but we never see that error. So what happens here is that if Cargo writes hello stdout, then stdout is closed, then Cargo writes 0\n, then that will succeed and Cargo won't ever see an error. It will, however, see an error if anything else is written. If Cargo's stdout is closed before all writes, however, then hello stdout will write successfully and then the 0\n write will trigger the error.

That explains why when changing to 0..1 from 0..1000 the test is flaky. I don't know why with 1000 writes we see the same issue though. We're still not synchronizing with Cargo's print of the original message, just the proc macro. With 1000 writes Cargo is guaranteed to make multiple writes which should see the eventual error in LineWriter. With 0..2 I'm unable to reproduce this issue...

So to confirm, can you reproduce with 0..2? (or anything bigger than 1)

@ehuss
Copy link
Contributor Author

ehuss commented Aug 3, 2020

Hm, sorry for the misleading direction. I wasn't aware of that interaction with LineWriter. However, I'm quite easily able to reproduce with 0..10. I ran tests for several hours today with 0..10000, but I wasn't able to repro. I thought I had done that in the past, but my memory is foggy.

My original intent with the 10,000 was just to ensure there wasn't any silliness with PIPE_BUF, but it wasn't based on any evidence.

I'll continue poking on this.

@ehuss
Copy link
Contributor Author

ehuss commented Aug 4, 2020

I'm returning to my original hypothesis that the issue is the file descriptor staying open because other threads are forking processes in the background, and there's a small window where the fd is duplicated. I'm able to repro with the following patch, running 1 test at a time (though it takes a while):

diff --git a/tests/testsuite/build.rs b/tests/testsuite/build.rs
index 01c16b515..1e00e3543 100644
--- a/tests/testsuite/build.rs
+++ b/tests/testsuite/build.rs
@@ -4939,6 +4939,14 @@ fn user_specific_cfgs_are_filtered_out() {
 fn close_output() {
     // What happens when stdout or stderr is closed during a build.

+    std::thread::spawn(|| {
+        let _test_guard = cargo_test_support::paths::init_root();
+        loop {
+            let _output = cargo_test_support::cargo_process("help")
+                .exec_with_output().unwrap();
+        }
+    });
+
     // Server to know when rustc has spawned.
     let listener = std::net::TcpListener::bind("127.0.0.1:0").unwrap();
     let addr = listener.local_addr().unwrap();
@@ -4975,7 +4983,7 @@ fn close_output() {
                     let mut buf = [0];
                     drop(socket.read_exact(&mut buf));
                     let use_stderr = std::env::var("__CARGO_REPRO_STDERR").is_ok();
-                    for i in 0..10000 {
+                    for i in 0..100 {
                         if use_stderr {
                             eprintln!("{}", i);
                         } else {

I'm not able to repro without other threads running in the background.

@alexcrichton I have two questions:

  • I kinda want to try to repro on rust-lang/rust's CI, do you think that would be worthwhile? I tried to repro on my account, but those only have 2 parallel jobs, and I wasn't able to.
    • As a side note, I noticed that the 4 failures were all with Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz. The other hardware that is often used is Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz which is a much newer/beefier system. However, with only 4 data points, it might be just a coincidence.
  • Do you think it would be worthwhile to just move this test to a single-threaded test runner?

@alexcrichton
Copy link
Member

I do actually always forget that CLOEXEC is, well, on exec not fork. Before we dive into this hypothesis though I'm curious:

  • With --test-threads 1 though you say you were able to reproduce, so where could the concurrent fork have happened?
  • Cargo should try to print 10000 items from rustc, and surely that's enough to overflow PIPE_BUF. So even if the file descriptor is kept open surely at some point cargo blocks on the write, and then fails when the other associated processes exit?

@ehuss
Copy link
Contributor Author

ehuss commented Aug 4, 2020

With --test-threads 1 though you say you were able to reproduce, so where could the concurrent fork have happened?

That was when I had 0..1, which was probably related to the LineWriter behavior.

Cargo should try to print 10000 items from rustc, and surely that's enough to overflow PIPE_BUF. So even if the file descriptor is kept open surely at some point cargo blocks on the write, and then fails when the other associated processes exit?

Hm, good point. I did some tests, and I'm able to write 64KB to stdout or stderr before it starts blocking. I would have expected 4K. TIL, according to pipe(7), the buffer is 16 pages.

So, maybe the answer here is to just print more than 64KB of data?

@alexcrichton
Copy link
Member

Ok, that makes sense with --test-threads=1. Otherwise yeah I think the best solution might be to write >64kb of data, we could probably write a megabyte or so just to be safe, because nothing should buffer that much. It does mean that the test will block unnecessarily while we wait for some other random test to finish, but that seems ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing-cargo-itself Area: cargo's tests C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants