Make it easier to locate deadlocked tests #2873

bblum · 2012-07-11T23:18:52Z

It's really a huge pain to figure out which test is the one that failed or hung, in which case this first half of the line doesn't even get printed.

brson · 2012-07-11T23:25:43Z

This is because by default when there are multiple cores, the test runner runs tests in parallel, so printing the name of the test, running the test, then printing the result would interleave it with arbitrary junk. If you run with RUST_THREADS=1 the test runner will do what you want and print the test name before running it.

bblum · 2012-07-11T23:50:52Z

How does it not already interleave? I'd thought a simple st.out.flush() would solve this, but if it wouldn't because of parallel printing, I must be missing something.

brson · 2012-07-11T23:58:21Z

It's not because of parallel printing (though that is a problem for tests that print to the console). The test runner reports all the results on the same thread, but if it is running them in parallel then they don't finish in the order they are run.

pnkfelix · 2013-03-22T14:06:53Z

Not critical for 0.6; de-milestoning

metajack · 2013-05-09T16:19:58Z

Is this the rust test harness, or the test harness that user code uses? If the latter, I think this should be nominated for feature complete.

alexcrichton · 2013-07-04T23:54:27Z

I think that this is a real issue for tests which possibly hang forever. I think that they way the tests are printed right now is fine (in both the single and multithreaded cases), but the major problem is that tests which hang (in the multithreaded test case) never print anything out.

This is a problem on bors because it's never known which test actually caused the process to hang. I think that the best way to handle this would be to handle signals somehow. If the test harness could catch SIGINT/SIGTERM and terminate all running tests (or at least print out all currently running tests as failures and then exit the process), then at least the source of the failures would be known.

Nominating for the production-ready milestone.

pnkfelix · 2013-07-06T01:42:59Z

Handling signals is part of #6842.

graydon · 2013-07-11T17:37:10Z

accepted for production-ready milestone

pnkfelix · 2014-01-16T18:35:15Z

Accepting for P-low.

brson · 2014-01-16T18:35:57Z

Ask @alexcrichton about how to fix this.

steveklabnik · 2015-01-21T18:11:09Z

Triage: still the same today.

bagedevimo · 2016-01-23T03:30:59Z

I'd like to take a look at this, @alexcrichton, did you have a suggestion as to how to begin?

alexcrichton · 2016-01-24T18:26:20Z

@bagedevimo unfortunately I don't know of a great way to solve this, but I can at least describe the problem!

The crux of the problem here is that tests which deadlock or for some other reason don't finish never print anything by default, which means it's very difficult to debug what's going on if the test suite times out (e.g. is killed by an external process). Tests are by default run in multiple threads which is why we don't print partial output.

Some possible solutions for this could be:

Use a library to have fancy control over terminal output, and make a better UI on top of that. For example you could have a status bar of all tests that have completed and then a line-per-thread which shows the status of the thread (e.g. if it's idle, what test it's running, etc). The terminal would then all get updated whenever anything completed. This is pretty fancy, however, and is unlikely to be fit for this repository itself unfortunately.
Set a timeout such that if no tests have completed after running for N seconds, the currently running tests are all printed to stdout. Either that or if any test itself is taking longer than 5 seconds, the test is printed to stdout. This would at least provide a status message which can be scraped to see what tests are running. This is unfortunately not very useful, however, if the test then later completes as it's just confusing output at that point.
A signal handler could be installed to catch termination of the test suite itself. This signal handler would then indicate to the test suite that it should print out all actively running tests and then exit. The hard part about this is (A) you have to deal with signal handlers and (B) it's unclear how well this would work on Windows (I don't think it would work for the buildbot case, but may work for the ctrl-c-at-terminal case).

My personal preference about how to tackle this would be to develop an awesome terminal-ui-test-runner externally so we can see how great it is, and then push really hard on custom test frameworks in Rust to ensure that the experience in using it is super smooth as well.

Sorry if that's not quite a concrete way of how to tackle this, but hopefully it's at least a start!

brson · 2016-07-19T20:08:26Z

I think @alexcrichton's second solution is a reasonable and simple way to reduce the effect of this problem: the test runner should flag tests that run very slow, on the assumption that they are deadlocking. The main loop should keep a list of outstanding tests, their start times and whether they've already been flagged, and set a high receive timeout (maybe 60 seconds, maybe even higher). Every time through the loop it checks for outstanding tests that have exceeded the timeout and haven't been flagged yet, then prints something like "test foo has been running for longer than 60 seconds".

With a high timeout it's unlikely that there will be many false positives, though certainly there are some tests that take more than 60 seconds to run. What you think @alexcrichton?

alexcrichton · 2016-07-19T20:41:26Z

I think it'd be a fine solution. I've only very rarely seen tests take that long (if at all) and perhaps a ping to say "this is still running" would be good anyway

jhod0 · 2016-07-30T18:24:00Z

In addition, should there be a way to customize the timeout?

e.g.:

#[test]
#[test_timeout(sec = 100)]
fn some_test() {
    ...
}

For the (very) rare case a test would naturally take such a long time, and a user wants to silence such warnings.

brson · 2016-08-05T19:35:53Z

@jhod0 Looks potentially desirable. Maybe worth having another issue for?

@brson

Add warning timeout for tests that run >1min This makes it easier to identify hanging tests. As described in #2873, when a test doesn't finish, we so far had no information on which test that was. In this PR, we add a duration of 60 seconds for each test, after which a warning will be printed mentioning that this specific test has been running for a long time already. Example output: https://gist.github.com/futile/6ea3eed85fe632df8633c1b03c08b012 r? @brson

futile · 2016-08-10T13:13:11Z

This is probably closed by #35405 ? The suggestion made @jhod0 sounds like it could be its own issue.

brson · 2016-08-11T20:45:17Z

Thanks @futile !

Update Rust toolchain from nightly-2023-11-11 to nightly-2023-11-12 without any other source changes.

pnkfelix mentioned this issue Jan 16, 2014

Need a solution for select / async events #6842

Closed

apasel422 mentioned this issue Nov 25, 2015

Tests should output the name before running #30047

Closed

brson added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. and removed E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Jul 19, 2016

brson changed the title ~~Rework the way test harness prints "test %s ..."~~ Make it easier to locate deadlocked tests Jul 22, 2016

futile mentioned this issue Aug 6, 2016

Add warning timeout for tests that run >1min #35405

Merged

brson closed this as completed Aug 11, 2016

WaDelma mentioned this issue Oct 28, 2016

Running tests timeout too fast #37461

Closed

sourcefrog mentioned this issue Dec 24, 2017

Show long-running tests in progress, even when multithreaded #46990

Closed

celinval pushed a commit to celinval/rust-dev that referenced this issue Jun 4, 2024

Automatic toolchain upgrade to nightly-2023-11-12 (rust-lang#2873)

234ca24

Update Rust toolchain from nightly-2023-11-11 to nightly-2023-11-12 without any other source changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to locate deadlocked tests #2873

Make it easier to locate deadlocked tests #2873

bblum commented Jul 11, 2012

brson commented Jul 11, 2012

bblum commented Jul 11, 2012

brson commented Jul 11, 2012

pnkfelix commented Mar 22, 2013

metajack commented May 9, 2013

alexcrichton commented Jul 4, 2013

pnkfelix commented Jul 6, 2013

graydon commented Jul 11, 2013

pnkfelix commented Jan 16, 2014

brson commented Jan 16, 2014

steveklabnik commented Jan 21, 2015

bagedevimo commented Jan 23, 2016

alexcrichton commented Jan 24, 2016

brson commented Jul 19, 2016

alexcrichton commented Jul 19, 2016

jhod0 commented Jul 30, 2016

brson commented Aug 5, 2016

futile commented Aug 10, 2016 •

edited

Loading

brson commented Aug 11, 2016

Make it easier to locate deadlocked tests #2873

Make it easier to locate deadlocked tests #2873

Comments

bblum commented Jul 11, 2012

brson commented Jul 11, 2012

bblum commented Jul 11, 2012

brson commented Jul 11, 2012

pnkfelix commented Mar 22, 2013

metajack commented May 9, 2013

alexcrichton commented Jul 4, 2013

pnkfelix commented Jul 6, 2013

graydon commented Jul 11, 2013

pnkfelix commented Jan 16, 2014

brson commented Jan 16, 2014

steveklabnik commented Jan 21, 2015

bagedevimo commented Jan 23, 2016

alexcrichton commented Jan 24, 2016

brson commented Jul 19, 2016

alexcrichton commented Jul 19, 2016

jhod0 commented Jul 30, 2016

brson commented Aug 5, 2016

futile commented Aug 10, 2016 • edited Loading

brson commented Aug 11, 2016

futile commented Aug 10, 2016 •

edited

Loading