-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with Azure Devops Reporter #7371
Comments
The error stack trace in https://github.com/dotnet/core-eng/issues/13026 doesn't have anything to do with the lock I added. This error is happening because the interpreter can't get the lock for "stdout" and none of my change affects locking of stdout. This error is just a very rare race condition where one of the worker threads (Thread 0x000070000f354000 in that stack trace) happens to be inside a _print call at the exact time that the process is trying to exit. Looking at what it was trying to print. One of the worker threads never got a chance to start, before the work was all completed and the process exited. |
Cool, I am mistaken. We should still perhaps use this or another issue to have the conversation about dealing with failure better. |
Just had a great conversation with @safern and he made a very keen insight. Specifically, if we just update the arcade reporter behavior to just return the real, not lying exit code when Azure Devops reporter fails, it's easy to implement and gives us the best of both things; when reporting fails, the work item still can pass, and when the work item fails and reporting fails, we still fail the check, e.g.:
|
Unfortunately while interesting, this idea doesn't work as simply as I'd hoped, because there's no way to get the decision made in generating RunTests.cmd (example template). Without removing these lines, there's no way to make the above pseudocode work since the exit code of the test is coerced in code not owned by the Helix Arcade SDK. I will confer with some runtime folks and get an idea of what they'd like to do. |
…ttempts) , and stop trying in the case of "run is deleted"
Going with the "just retry for 503s" approach, #7399 |
* #7371 - retry in the case of HTTP 503 (10x, 3 seconds between attempts) , and stop trying in the case of "run is deleted" * Oops
This is merged and should begin flowing to main-branch Arcade consumers soon, I'll keep this in validate until I see some uptake. |
Another hit on the threading issue: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-52793-merge-d1edb0fae6604797b1/Microsoft.Extensions.Hosting.Functional.Tests/console.f1e93568.log?sv=2019-07-07&se=2021-06-07T20%3A07%3A07Z&sr=c&sp=rl&sig=fZdoSW%2Fcu7ZkOzwiZiYlIOE6rA4GERWu6EbZ98P3QFY%3D
|
@ericstj I may take some stabs at this one tomorrow but it's much harder than simply adding retries to AzDO calls. |
…or failures if ADO fails for any reason, return 0 if we actually passed.
Added 65 unique test run instances to the IcM, since they seem to be unable to find them. |
THere's not much to do here and the mitigations seem to be helping so putting in Tracking/blocked. |
- Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can assume if the script is called it starts. - sleep the main thread 2s before accessing stdout in case the exception case wants to happen
new PR up, we can easily avoid the race condition by eliminating the race entirely. |
I accidentally made what I intended to be a 2s sleep 33 minutes, and oddly pushing an updated version didn't seem to fix this; will pick this back up on Tuesday. |
- Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can assume if the script is called it starts. - sleep() the main thread 5s before accessing stdout in case the exception case wants to happen - Remove most non-error-path logging since it adds little value and multiple threads competing for stdout can cause un-catch-able failure.
- Don't have a threading contention for stdout by only using stdout in the exception case; "starting..." adds minimal value, one can assume if the script is called it starts. - sleep() the main thread 5s before accessing stdout in case the exception case wants to happen - Remove most non-error-path logging since it adds little value and multiple threads competing for stdout can cause un-catch-able failure.
Merged the PR with attempted workaround, will leave this in Validate til it makes it to Runtime. |
Current workaround is in Runtime, and seems to be holding, closing (somehow I thought I already had) |
As I've written up in https://github.com/dotnet/core-eng/issues/13026, we introduced a threading bug in https://github.com/dotnet/arcade/pull/7310/files that can crash the reporter. It may be difficult to intentionally reproduce this problem since it kind of relies on actually doing the reporting, but we can inspect the code or just revert the
lock
part of the change and instead just do the "don't let it pass if it doesn't finish" part of the change.The text was updated successfully, but these errors were encountered: