Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to locate deadlocked tests #2873

Closed
bblum opened this issue Jul 11, 2012 · 19 comments
Closed

Make it easier to locate deadlocked tests #2873

bblum opened this issue Jul 11, 2012 · 19 comments
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority

Comments

@bblum
Copy link
Contributor

bblum commented Jul 11, 2012

It's really a huge pain to figure out which test is the one that failed or hung, in which case this first half of the line doesn't even get printed.

@brson
Copy link
Contributor

brson commented Jul 11, 2012

This is because by default when there are multiple cores, the test runner runs tests in parallel, so printing the name of the test, running the test, then printing the result would interleave it with arbitrary junk. If you run with RUST_THREADS=1 the test runner will do what you want and print the test name before running it.

@bblum
Copy link
Contributor Author

bblum commented Jul 11, 2012

How does it not already interleave? I'd thought a simple st.out.flush() would solve this, but if it wouldn't because of parallel printing, I must be missing something.

@brson
Copy link
Contributor

brson commented Jul 11, 2012

It's not because of parallel printing (though that is a problem for tests that print to the console). The test runner reports all the results on the same thread, but if it is running them in parallel then they don't finish in the order they are run.

@pnkfelix
Copy link
Member

Not critical for 0.6; de-milestoning

@metajack
Copy link
Contributor

metajack commented May 9, 2013

Is this the rust test harness, or the test harness that user code uses? If the latter, I think this should be nominated for feature complete.

@alexcrichton
Copy link
Member

I think that this is a real issue for tests which possibly hang forever. I think that they way the tests are printed right now is fine (in both the single and multithreaded cases), but the major problem is that tests which hang (in the multithreaded test case) never print anything out.

This is a problem on bors because it's never known which test actually caused the process to hang. I think that the best way to handle this would be to handle signals somehow. If the test harness could catch SIGINT/SIGTERM and terminate all running tests (or at least print out all currently running tests as failures and then exit the process), then at least the source of the failures would be known.

Nominating for the production-ready milestone.

@pnkfelix
Copy link
Member

pnkfelix commented Jul 6, 2013

Handling signals is part of #6842.

@graydon
Copy link
Contributor

graydon commented Jul 11, 2013

accepted for production-ready milestone

@pnkfelix
Copy link
Member

Accepting for P-low.

@brson
Copy link
Contributor

brson commented Jan 16, 2014

Ask @alexcrichton about how to fix this.

@steveklabnik
Copy link
Member

Triage: still the same today.

@bagedevimo
Copy link

I'd like to take a look at this, @alexcrichton, did you have a suggestion as to how to begin?

@alexcrichton
Copy link
Member

@bagedevimo unfortunately I don't know of a great way to solve this, but I can at least describe the problem!

The crux of the problem here is that tests which deadlock or for some other reason don't finish never print anything by default, which means it's very difficult to debug what's going on if the test suite times out (e.g. is killed by an external process). Tests are by default run in multiple threads which is why we don't print partial output.

Some possible solutions for this could be:

  • Use a library to have fancy control over terminal output, and make a better UI on top of that. For example you could have a status bar of all tests that have completed and then a line-per-thread which shows the status of the thread (e.g. if it's idle, what test it's running, etc). The terminal would then all get updated whenever anything completed. This is pretty fancy, however, and is unlikely to be fit for this repository itself unfortunately.
  • Set a timeout such that if no tests have completed after running for N seconds, the currently running tests are all printed to stdout. Either that or if any test itself is taking longer than 5 seconds, the test is printed to stdout. This would at least provide a status message which can be scraped to see what tests are running. This is unfortunately not very useful, however, if the test then later completes as it's just confusing output at that point.
  • A signal handler could be installed to catch termination of the test suite itself. This signal handler would then indicate to the test suite that it should print out all actively running tests and then exit. The hard part about this is (A) you have to deal with signal handlers and (B) it's unclear how well this would work on Windows (I don't think it would work for the buildbot case, but may work for the ctrl-c-at-terminal case).

My personal preference about how to tackle this would be to develop an awesome terminal-ui-test-runner externally so we can see how great it is, and then push really hard on custom test frameworks in Rust to ensure that the experience in using it is super smooth as well.

Sorry if that's not quite a concrete way of how to tackle this, but hopefully it's at least a start!

@brson
Copy link
Contributor

brson commented Jul 19, 2016

I think @alexcrichton's second solution is a reasonable and simple way to reduce the effect of this problem: the test runner should flag tests that run very slow, on the assumption that they are deadlocking. The main loop should keep a list of outstanding tests, their start times and whether they've already been flagged, and set a high receive timeout (maybe 60 seconds, maybe even higher). Every time through the loop it checks for outstanding tests that have exceeded the timeout and haven't been flagged yet, then prints something like "test foo has been running for longer than 60 seconds".

With a high timeout it's unlikely that there will be many false positives, though certainly there are some tests that take more than 60 seconds to run. What you think @alexcrichton?

@brson brson added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. and removed E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Jul 19, 2016
@alexcrichton
Copy link
Member

I think it'd be a fine solution. I've only very rarely seen tests take that long (if at all) and perhaps a ping to say "this is still running" would be good anyway

@brson brson changed the title Rework the way test harness prints "test %s ..." Make it easier to locate deadlocked tests Jul 22, 2016
@jhod0
Copy link

jhod0 commented Jul 30, 2016

In addition, should there be a way to customize the timeout?

e.g.:

#[test]
#[test_timeout(sec = 100)]
fn some_test() {
    ...
}

For the (very) rare case a test would naturally take such a long time, and a user wants to silence such warnings.

@brson
Copy link
Contributor

brson commented Aug 5, 2016

@jhod0 Looks potentially desirable. Maybe worth having another issue for?

bors added a commit that referenced this issue Aug 10, 2016
Add warning timeout for tests that run >1min

This makes it easier to identify hanging tests. As described in #2873,
when a test doesn't finish, we so far had no information on which test
that was. In this PR, we add a duration of 60 seconds for each test,
after which a warning will be printed mentioning that this specific test
has been running for a long time already.

Example output:
https://gist.github.com/futile/6ea3eed85fe632df8633c1b03c08b012

r? @brson
@futile
Copy link
Contributor

futile commented Aug 10, 2016

This is probably closed by #35405 ? The suggestion made @jhod0 sounds like it could be its own issue.

@brson brson closed this as completed Aug 11, 2016
@brson
Copy link
Contributor

brson commented Aug 11, 2016

Thanks @futile !

celinval pushed a commit to celinval/rust-dev that referenced this issue Jun 4, 2024
Update Rust toolchain from nightly-2023-11-11 to nightly-2023-11-12
without any other source changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority
Projects
None yet
Development

No branches or pull requests

10 participants