-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutex.TryOpenExisting intermittently throws IOException #76736
Comments
Tagging subscribers to this area: @mangod9 Issue DetailsDescriptionAfter introducing .NET 7 rc1 SDK into Runtime CI we have started seeing intermittent exceptions Reproduction StepsI was trying hard to create minimal repro - without success. Expected behavior
Actual behavior
Regression?Unknown Known WorkaroundsUnknown. ConfigurationI have seen this mostly on:
Other informationNo response
|
The |
@kouvel This was thrown from here: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/Mutex.Windows.cs#L35 . Note that the "Connection timed out" message may be bogus. This code is mixing and matching Windows and Unix error codes. The error comes from msbuild. msbuild always runs on CoreCLR. This is not Mono problem. |
Ok makes sense. I wasn't aware of anything on these paths that would cause a timeout, but may be possible depending on the setup. I'll try to see what errors may be leading to this and what may be causing it. |
Tagging subscribers to this area: @mangod9 Issue DetailsDescriptionAfter introducing .NET 7 rc1 SDK into Runtime CI we have started seeing intermittent exceptions Reproduction StepsI was trying hard to create minimal repro - without success. Expected behavior
Actual behavior
Regression?Unknown Known WorkaroundsUnknown. ConfigurationI have seen this mostly on:
Other informationNo response
|
adding @janvorli here too, in case this is related to some recent changes. |
The error is |
Yea looks like the Windows error codes are not converted by the PAL anymore and the message is incorrect. This error could have occurred for various reasons. Maybe I'll try to preserve more accurate errors and try to get a failure again in a PR's CI. |
Kusto shows that this issue has happened about 50 times during the last 30 days and it always occurs in the mono wasm legs. It seems the reason it occurs there is that for wasm, the compilation of each test happens during run of the test (when its generated .sh script is called). |
It is not clear to me why we mix in the Unix error messages there at all when the method is named |
I guess nobody noticed that the changes in #70685 have unintended interaction with Win32 emulator PAL uses in CoreLib. It is very hard to keep in mind at all times that a few parts of the CoreLib use the Win32 emulator PAL. |
Error message fix: #76768 |
If I recall there was a desire to use the newest API when available, even if the performance could be impacted. Seems like the SPCL scenario needs to be considered as well. Thanks @jkotas for rooting it out. |
Contributes to #76736 Co-authored-by: Jan Kotas <jkotas@microsoft.com>
I haven't been able to repro the error locally using the same container image. An strace would help to see which API is failing and the error code, so a possibility may be to add strace onto the msbuild command to get that output. I'm not sure if the container is set up for strace though, anyone know where these containers are set up? It may just be a matter of adding commands to install strace as root. It may also be useful to preserve more info about the errors such as the failing API, relevant parameters, and the error code, and to include that in the exception message. I can look into improving that, but I guess it wouldn't help until we have a better way of reproing the issue or until a newer version of the SDK with that change is used in the CI. |
Inactive for a while. Let close it. |
Description
After introducing .NET 7 rc1 SDK into Runtime CI we have started seeing intermittent exceptions
System.IO.IOException: Connection timed out : 'Global\msbuild-server-launch-{45_random-chars}'
in Runtime and Arcade on Linux CI agents or docker based Linux builds.Reproduction Steps
I was trying hard to create minimal repro - without success.
I believe easiest repro would be to rerun some of our CI's where it was seen:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=31675&view=logs&j=3fe1f0d5-61d6-5e8f-eead-4d3bcfb9dfc3&t=380e8ab7-dd79-5cad-d265-1eca160e9b82&s=526c4a30-42a9-575e-a58b-243c7c515350
https://dev.azure.com/dnceng-public/public/_build/results?buildId=41146&view=logs&j=190ad6c8-5950-568c-cadd-f2dfb7d5a79f&t=c0f6fdc1-ac5d-583c-8ae1-a18de0846552
Expected behavior
new Mutex(initiallyOwned: true, name: "Global\UniqueName", out bool createdNew)
andMutex.TryOpenExisting
shall never intermitently throw IOException.Actual behavior
new Mutex(initiallyOwned: true, name: "Global\UniqueName", out bool createdNew)
andMutex.TryOpenExisting
sometimes throws:Regression?
Unknown
Known Workarounds
Unknown.
Configuration
I have seen this mostly on:
Other information
No response
The text was updated successfully, but these errors were encountered: