-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merge-on-Red] - Implement Test Process Watcher #78742
Conversation
…ut now gotta convert a whole char ** to a char *.
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
Hi @jkoritzinsky! Here's the initial draft of the test watcher. It works on Linux and Windows, and it's ready for sharing, so you can take a look and we can adjust/refactor/edit/etc as necessary. |
Tagging subscribers to this area: @hoyosjs Issue DetailsThis PR adds a new harness to run tests by means of corerun. The main purpose of adding it, is to be able to have a way to monitor potential freezes and hangs when running tests. If a specified time frame lapses, and the test hasn't finished, then the watcher will automatically kill the process and report accordingly. With this mechanism in place, we will no longer have incomplete test runs that froze somewhere, without any further information, making test failures of this kind much easier to begin investigating, and consequently fix them. Remaining Tasks Until Completion:
|
… yet functional, but gotta save my progress :)
I think we can just include the watchdog CMakeLists.txt from the CoreCLR and Mono CMake builds. I don’t think we need to introduce a separate subset and native project build for it. |
src/native/watchdog/watchdog.cpp
Outdated
#else | ||
const int check_interval = 1000; | ||
int check_count = 0; | ||
char **args = new char *[exe_argc]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a vector here. You forgot to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot the deletion, thanks for pointing it out Andy. The reason we're using a char*[]
is because that's the type that execv()
requires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_ptr<char*[]>
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work as well. I'll try it.
fwiw I'm not completely sold on C++ if we have to parse and fixup the XML file. I'd rather use... anything else for string manipulation tbh. |
Yeah I agree, C++ is too much for this. We should do it in plain C like it should be :) Jokes aside, thanks for pointing this out @agocke. We seem be on different pages on this, so let's take this chance to sync that. The YAML log is generated in C#, right here: https://github.com/ivdiazsa/runtime/blob/2f66c46c1948af53c447e1efe0d1cc32867244f3/src/tests/Common/XUnitWrapperLibrary/TestSummary.cs#L95 I was under the impression that the XML corrector, if we decide to go with that approach, would go somewhere around there as well. |
…irectory, rather than just the executable, and reallowed tests to be run without the watcher.
… of the object artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, but other than that, LGTM!
src/native/watchdog/watchdog.cpp
Outdated
|
||
#else // !TARGET_WINDOWS | ||
|
||
// TODO: Describe what the 'ms_factor' is, and why it's being used here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops forgot that. Thanks for the catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually removed it altogether. I needed it because originally, I was dealing with some C stuff that combined microseconds and milliseconds. After switching to C++'s std::chrono::milliseconds
, testing with and without it yielded virtually the same wait time, so it ended up being redundant.
<HelixCorrelationPayload Include="$(XUnitLogCheckerDirectory)" /> | ||
|
||
<!-- Browser-Wasm follows a very different workflow, which is currently out of scope of the Log Checker. --> | ||
<HelixCorrelationPayload Include="$(XUnitLogCheckerDirectory)" Condition="'$(TargetsBrowser)' != 'true'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the current behavior for wasm logs then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. Wasm is out of scope of this project for the time being, hence we are excluding it. So wasm tests will remain unaffected.
It seems undesirable to hard-code yet another timeout value (300 seconds?) into a new location (and there isn't a big, obvious comment noting that's what the number is). Especially since the YAML files (?) already specify per-test timeouts. E.g., it wouldn't surprise me if some merged test cases (e.g., Hardware Intrinsics) run under GCStress=3 on Linux/arm could be very slow. |
This PR adds a new harness to run tests by means of corerun. This is the watchdog work item defined in issue #77735.
The main purpose of adding it, is to be able to have a way to monitor potential freezes and hangs when running tests. If a specified time frame lapses, and the test hasn't finished, then the watcher will automatically kill the process and report accordingly.
With this mechanism in place, we will no longer have incomplete test runs that froze somewhere, without any further information, making test failures of this kind much easier to begin investigating, and consequently fix them.
Remaining Tasks Until Completion: