Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex ctor will often throw IOException immediately after macOS reboot #79375

Open
darthwalsh opened this issue Dec 8, 2022 · 6 comments
Open
Assignees
Milestone

Comments

@darthwalsh
Copy link
Contributor

Description

The call new System.Threading.Mutex(false, "some_text") running in multiple processes will throw System.IO.IOException The system cannot open the device or file specified. : 'some_text' maybe 30% of the time after I've rebooted my macbook.

This seems to be the root cause of PowerShell/PSReadLine#2658, as pwsh PSReadLine process attempts to use a Mutex during setup.

Reproduction Steps

Create a new net6.0 console app that

  • calls Mutex ctor
  • prints an exception if it happened
  • stays open to show output

Program.cs:

try {
  using (var m = new Mutex(false, "darthwalsh_PSReadLine_issues_2658")) {
    Console.Write("created but didn't take Mutex darthwalsh_PSReadLine_issues_2658   ...");
    var line = Console.ReadLine();
    Console.WriteLine("QUITTING: " + line);
  }
} catch (Exception e) {
  Console.WriteLine(e.ToString());

  var line = Console.ReadLine();
  Console.WriteLine("QUITTING: " + line);
}

Build with dotnet publish -c Release --self-contained -a x64

Created a new iTerm2 profile that only runs the compiled output, i.e. /Users/walshca/code/temp/MutexThrow/bin/Release/net6.0/osx-x64/publish/MutexThrow and open this profile in 10 tabs, each split into 12 instances. (Having 120 processes in parallel is likely overkill, but it seems to guarantee a repro.) (see Actual image)

Ensure iTerm2 is set up to relaunch with the same tabs.

Reboot macOS. Log in. Find iTerm. Look through all tabs at output messages.

Expected behavior

120x messages of created but didn't take Mutex darthwalsh_PSReadLine_issues_2658 ... with no exceptions

Actual behavior

Out of 120 processes, index 0, 2, 5, and 12 failed with message:

System.IO.IOException: The system cannot open the device or file specified. : 'darthwalsh_PSReadLine_issues_2658'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at Program.<Main>$(String[] args) in /Users/walshca/code/temp/MutexThrow/Program.cs:line 2

image

(the screenshot shows the failures, then a couple attempts to reproduce the problem by ownly logging out without rebooting, which did not trigger it.)

Regression?

No response

Known Workarounds

The conversation in PowerShell/PSReadLine#2658 discussed whether catching and retrying the exception should be tried, but it's unclear what is causing Mutex ctor to throw.

Configuration

  • Build with dotnet 6.0.822.36306
  • macOS Monterey 12.6.1 21G217
  • x64 2.3 GHz 8-Core Intel Core i9
  • It seems likely OS-specific, unknown about arch or build flags

Other information

A related problem #36823 can happen if the /tmp permissions are incorrect, but that's not the case here.

The Mutex.ctor() API docs isn't clear if this exception is expected and transient, and we should try to catch it and retry Mutex creation?

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 8, 2022
@ghost
Copy link

ghost commented Dec 8, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The call new System.Threading.Mutex(false, "some_text") running in multiple processes will throw System.IO.IOException The system cannot open the device or file specified. : 'some_text' maybe 30% of the time after I've rebooted my macbook.

This seems to be the root cause of PowerShell/PSReadLine#2658, as pwsh PSReadLine process attempts to use a Mutex during setup.

Reproduction Steps

Create a new net6.0 console app that

  • calls Mutex ctor
  • prints an exception if it happened
  • stays open to show output

Program.cs:

try {
  using (var m = new Mutex(false, "darthwalsh_PSReadLine_issues_2658")) {
    Console.Write("created but didn't take Mutex darthwalsh_PSReadLine_issues_2658   ...");
    var line = Console.ReadLine();
    Console.WriteLine("QUITTING: " + line);
  }
} catch (Exception e) {
  Console.WriteLine(e.ToString());

  var line = Console.ReadLine();
  Console.WriteLine("QUITTING: " + line);
}

Build with dotnet publish -c Release --self-contained -a x64

Created a new iTerm2 profile that only runs the compiled output, i.e. /Users/walshca/code/temp/MutexThrow/bin/Release/net6.0/osx-x64/publish/MutexThrow and open this profile in 10 tabs, each split into 12 instances. (Having 120 processes in parallel is likely overkill, but it seems to guarantee a repro.) (see Actual image)

Ensure iTerm2 is set up to relaunch with the same tabs.

Reboot macOS. Log in. Find iTerm. Look through all tabs at output messages.

Expected behavior

120x messages of created but didn't take Mutex darthwalsh_PSReadLine_issues_2658 ... with no exceptions

Actual behavior

Out of 120 processes, index 0, 2, 5, and 12 failed with message:

System.IO.IOException: The system cannot open the device or file specified. : 'darthwalsh_PSReadLine_issues_2658'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at Program.<Main>$(String[] args) in /Users/walshca/code/temp/MutexThrow/Program.cs:line 2

image

(the screenshot shows the failures, then a couple attempts to reproduce the problem by ownly logging out without rebooting, which did not trigger it.)

Regression?

No response

Known Workarounds

The conversation in PowerShell/PSReadLine#2658 discussed whether catching and retrying the exception should be tried, but it's unclear what is causing Mutex ctor to throw.

Configuration

  • Build with dotnet 6.0.822.36306
  • macOS Monterey 12.6.1 21G217
  • x64 2.3 GHz 8-Core Intel Core i9
  • It seems likely OS-specific, unknown about arch or build flags

Other information

A related problem #36823 can happen if the /tmp permissions are incorrect, but that's not the case here.

The Mutex.ctor() API docs isn't clear if this exception is expected and transient, and we should try to catch it and retry Mutex creation?

Author: darthwalsh
Assignees: -
Labels:

area-System.Threading

Milestone: -

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Dec 8, 2022
@mangod9 mangod9 added this to the 8.0.0 milestone Dec 8, 2022
@mangod9
Copy link
Member

mangod9 commented Dec 8, 2022

@kouvel. Does this repro consistently for you @darthwalsh ?

@darthwalsh
Copy link
Contributor Author

@mangod9 yes every time I attempt the repro with at least 10 concurrent processes, about 3 of the processes fail.

@darthwalsh
Copy link
Contributor Author

@mangod9 I've kept my repo running through a few more reboots. The first two reboots each had 3 thrown exceptions. The last reboot had 0 exceptions. So during OS startup with dozens of dotnet processes, I'd expect to see a crash about 90% of the time.

Is there any more investigation I can do to help move this issue forward?

Also, do we have a rough idea of what the fix will be? i.e. maybe Mutex ctor() is changed to somehow not throw (i.e. retry on file access error)? Or, would the docs on MSDN be updated to say some exceptions are expected in this situation?

@darthwalsh
Copy link
Contributor Author

Continuing to update this, my macbook updated to 13.2 Ventura. I am monitoring with 16 processes starting in iTerm. I haven't had the issue on the two reboots since OS upgrade, but I'm not confident about the cause: Did 13.2 fix something? Did the OS change something about how soon applications are started after booting? Did the race condition get lucky twice in a row?

It will still be helpful to get an update to the docs about the exception policy, whether or not this specific issue will not reproduce again.

@darthwalsh
Copy link
Contributor Author

Continuing to update this, I've since upgraded to macOS 14.1.2 and still regularly see this, as recently as November. I'm not sure if there's a better way to look at the data, but global-find-substring shows I have seen this error a total of 136 times in my iTerm history in the last 10 months.

I'll be upgrading to an Apple Silicon macbook, and probably won't keep trying to reproduce the issue. As long as dotnet core supports macOS on Intel chips this feels worth fixing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants