-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to locate deadlocked tests #2873
Comments
This is because by default when there are multiple cores, the test runner runs tests in parallel, so printing the name of the test, running the test, then printing the result would interleave it with arbitrary junk. If you run with |
How does it not already interleave? I'd thought a simple |
It's not because of parallel printing (though that is a problem for tests that print to the console). The test runner reports all the results on the same thread, but if it is running them in parallel then they don't finish in the order they are run. |
Not critical for 0.6; de-milestoning |
Is this the rust test harness, or the test harness that user code uses? If the latter, I think this should be nominated for feature complete. |
I think that this is a real issue for tests which possibly hang forever. I think that they way the tests are printed right now is fine (in both the single and multithreaded cases), but the major problem is that tests which hang (in the multithreaded test case) never print anything out. This is a problem on bors because it's never known which test actually caused the process to hang. I think that the best way to handle this would be to handle signals somehow. If the test harness could catch SIGINT/SIGTERM and terminate all running tests (or at least print out all currently running tests as failures and then exit the process), then at least the source of the failures would be known. Nominating for the production-ready milestone. |
Handling signals is part of #6842. |
accepted for production-ready milestone |
Accepting for P-low. |
Ask @alexcrichton about how to fix this. |
Triage: still the same today. |
I'd like to take a look at this, @alexcrichton, did you have a suggestion as to how to begin? |
@bagedevimo unfortunately I don't know of a great way to solve this, but I can at least describe the problem! The crux of the problem here is that tests which deadlock or for some other reason don't finish never print anything by default, which means it's very difficult to debug what's going on if the test suite times out (e.g. is killed by an external process). Tests are by default run in multiple threads which is why we don't print partial output. Some possible solutions for this could be:
My personal preference about how to tackle this would be to develop an awesome terminal-ui-test-runner externally so we can see how great it is, and then push really hard on custom test frameworks in Rust to ensure that the experience in using it is super smooth as well. Sorry if that's not quite a concrete way of how to tackle this, but hopefully it's at least a start! |
I think @alexcrichton's second solution is a reasonable and simple way to reduce the effect of this problem: the test runner should flag tests that run very slow, on the assumption that they are deadlocking. The main loop should keep a list of outstanding tests, their start times and whether they've already been flagged, and set a high receive timeout (maybe 60 seconds, maybe even higher). Every time through the loop it checks for outstanding tests that have exceeded the timeout and haven't been flagged yet, then prints something like "test foo has been running for longer than 60 seconds". With a high timeout it's unlikely that there will be many false positives, though certainly there are some tests that take more than 60 seconds to run. What you think @alexcrichton? |
I think it'd be a fine solution. I've only very rarely seen tests take that long (if at all) and perhaps a ping to say "this is still running" would be good anyway |
In addition, should there be a way to customize the timeout? e.g.:
For the (very) rare case a test would naturally take such a long time, and a user wants to silence such warnings. |
@jhod0 Looks potentially desirable. Maybe worth having another issue for? |
Add warning timeout for tests that run >1min This makes it easier to identify hanging tests. As described in #2873, when a test doesn't finish, we so far had no information on which test that was. In this PR, we add a duration of 60 seconds for each test, after which a warning will be printed mentioning that this specific test has been running for a long time already. Example output: https://gist.github.com/futile/6ea3eed85fe632df8633c1b03c08b012 r? @brson
Thanks @futile ! |
Update Rust toolchain from nightly-2023-11-11 to nightly-2023-11-12 without any other source changes.
It's really a huge pain to figure out which test is the one that failed or hung, in which case this first half of the line doesn't even get printed.
The text was updated successfully, but these errors were encountered: