Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOException running NuGet-Migrations during tests in dotnet CLI first run #80619

Closed
Tracked by #93172
akoeplinger opened this issue Jan 13, 2023 · 72 comments · Fixed by #90342 or Unity-Technologies/ml-agents#6083
Assignees
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@akoeplinger
Copy link
Member

akoeplinger commented Jan 13, 2023

This is affecting 6.0, 7.0 and 8.0.

In #80510 (comment) we saw a failure because some NuGet migration code failed to run during the dotnet CLI first run experience:

+ dotnet /datadisks/disk1/work/ADFC09B1/p/xunit/xunit.console.dll JIT/HardwareIntrinsics/JIT.HardwareIntrinsics.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.HardwareIntrinsics.Arm.ArmBase
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 7.0.2)
  Discovering: JIT.HardwareIntrinsics.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.HardwareIntrinsics.XUnitWrapper (found 6 of 362 test cases)
  Starting:    JIT.HardwareIntrinsics.XUnitWrapper (parallel test collections = on, max threads = 2)
    JIT/HardwareIntrinsics/Arm/ArmBase.Arm64/ArmBase.Arm64_ro/ArmBase.Arm64_ro.sh [FAIL]
      System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
         at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
         at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
         at NuGet.Common.Migrations.MigrationRunner.Run()
         at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
         at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
         at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
         at Microsoft.DotNet.Cli.Program.Main(String[] args)
      
      Return code:      1

We should look into disabling the first run experience via the DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 env var, it doesn't make sense to run that for the tests.

Report

Build Definition Test Pull Request
368683 dotnet/runtime JIT.Methodical.eE.WorkItemExecution #90241
362596 dotnet/runtime JIT.Methodical.eE.WorkItemExecution #89003
361655 dotnet/runtime JIT.Math.WorkItemExecution #89905
361259 dotnet/runtime JIT.Generics.WorkItemExecution
361075 dotnet/runtime JIT.Regression.WorkItemExecution #89867
360768 dotnet/runtime JIT.Regression.JitBlue.WorkItemExecution #89815
360148 dotnet/runtime JIT.jit64.WorkItemExecution #89814
359522 dotnet/runtime JIT.Methodical.WorkItemExecution #89805
356421 dotnet/runtime JIT.Methodical.a-dA-D.WorkItemExecution
355235 dotnet/runtime JIT.Methodical.f-iF-I.WorkItemExecution #89009
343534 dotnet/runtime JIT.Methodical.WorkItemExecution
342037 dotnet/runtime JIT.Regression.JitBlue.WorkItemExecution #89008
341937 dotnet/runtime JIT.Methodical.a-dA-D.WorkItemExecution #89003
337982 dotnet/runtime JIT.HardwareIntrinsics.X86.Avx2.WorkItemExecution #88809
337742 dotnet/runtime JIT.Regression.CLR-x86-JIT.V1-M12-M13.WorkItemExecution #88769

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 1 15

Known Issue Error Message

Fill the error message using known issues guidance.

{
  "ErrorPattern": "The system cannot open the device or file specified. : ('|')NuGet-Migrations('|')",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎
Result validation: ⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: 🔎" line.
Validation performed at: 6/28/2023 10:04:55 PM UTC

@ghost
Copy link

ghost commented Jan 13, 2023

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

In #80510 (comment) we saw a failure because some NuGet migration code failed to run during the dotnet CLI first run experience:

+ dotnet /datadisks/disk1/work/ADFC09B1/p/xunit/xunit.console.dll JIT/HardwareIntrinsics/JIT.HardwareIntrinsics.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.HardwareIntrinsics.Arm.ArmBase
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 7.0.2)
  Discovering: JIT.HardwareIntrinsics.XUnitWrapper (method display = ClassAndMethod, method display options = None)
  Discovered:  JIT.HardwareIntrinsics.XUnitWrapper (found 6 of 362 test cases)
  Starting:    JIT.HardwareIntrinsics.XUnitWrapper (parallel test collections = on, max threads = 2)
    JIT/HardwareIntrinsics/Arm/ArmBase.Arm64/ArmBase.Arm64_ro/ArmBase.Arm64_ro.sh [FAIL]
      System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
         at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
         at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
         at NuGet.Common.Migrations.MigrationRunner.Run()
         at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
         at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
         at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
         at Microsoft.DotNet.Cli.Program.Main(String[] args)
      
      Return code:      1

We should look into disabling the first run experience via the DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 env var, it doesn't make sense to run that for the tests.

Author: akoeplinger
Assignees: -
Labels:

area-Infrastructure

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 13, 2023
@ViktorHofer
Copy link
Member

I thought that we were already setting that property at least for libraries tests. Maybe not for runtime tests? cc @jkoritzinsky

@carlossanlop
Copy link
Member

Another PR that hit this: #80615

@akoeplinger akoeplinger removed the untriaged New issue has not been triaged by the area owner label Feb 7, 2023
@akoeplinger akoeplinger added this to the 8.0.0 milestone Feb 7, 2023
@radical
Copy link
Member

radical commented Feb 7, 2023

I'm hitting this fairly often. For example on a rolling build(log).

@radical
Copy link
Member

radical commented Feb 7, 2023

Another rolling build(log).

@carlossanlop carlossanlop added the Known Build Error Use this to report build issues in the .NET Helix tab label Feb 9, 2023
@carlossanlop
Copy link
Member

Another 7.0 dependency flow PR affected by this: #81812

@carlossanlop
Copy link
Member

I thought that we were already setting that property at least for libraries tests. Maybe not for runtime tests? cc @jkoritzinsky

Ping @jkoritzinsky

@lewing
Copy link
Member

lewing commented Feb 24, 2023

did we do something in net8 that needs to also be done in net7 here?

@radical
Copy link
Member

radical commented Feb 28, 2023

This is affecting lot of builds.

@trylek
Copy link
Member

trylek commented Feb 28, 2023

I don't see any property / variable named DOTNET_SKIP_FIRST_TIME_EXPERIENCE in the dotnet/sdk repo, @ViktorHofer / @akoeplinger - am I looking in the wrong place (can you please point me to the place where it's used?) or are you saying we need to build this as a new feature? In fact I'm somewhat puzzled by the above code, these are apparently CoreCLR tests and they shouldn't be normally executed via dotnet, that must be coming from somewhere else like Crossgen2 execution.

@radical
Copy link
Member

radical commented Feb 28, 2023

@trylek
Copy link
Member

trylek commented Feb 28, 2023

This one is interesting but it's not the root thing, it just reacts on the fact that the variable isn't used anywhere:

dotnet/sdk@27b6661

I continue looking what gutted this behavior.

@trylek
Copy link
Member

trylek commented Feb 28, 2023

OK, so as I now understand it, the call to MigrationRunner.Run that is causing the above issue is now here:

https://github.com/dotnet/sdk/blob/fbc089ea96f0187c31e9ae176278334c8d5defc6/src/Cli/Microsoft.DotNet.Configurer/DotnetFirstTimeUseConfigurer.cs#L62

The check ShouldPrintFirstTimeUseNotice() basically calls the method Exists on the IFirstTimeUseNoticeSentinel interface. It seems to me that it should be possible to pull this off by using the

https://github.com/dotnet/sdk/blob/main/src/Cli/Microsoft.DotNet.Configurer/NoOpFirstTimeUseNoticeSentinel.cs

class for the implementation of the sentinel; this seems to be tied to the conditional block

https://github.com/dotnet/sdk/blob/b68de63954610d17b82166de586567dd244a85ca/src/Cli/dotnet/Program.cs#L183

triggered by the command-line parameter

https://github.com/dotnet/sdk/blob/12209f087e1c0256db8b08e7e7217f7526628d8b/src/Cli/dotnet/Parser.cs#L32

which leads me to

https://github.com/dotnet/sdk/blob/2c98ff1381def385256788cfcbaefb3ac32f5778/src/Cli/dotnet/commands/dotnet-internal-reportinstallsuccess/InternalReportinstallsuccessCommandParser.cs#L10

In this light I tend to believe that it might be possible to workaround this by using the internal option internal-reportinstallsuccess. I must admit I'm seeing this logic for the first time so it would be great to hear confirmation from someone familiar with this logic, actually fixing this e.g. w.r.t. Crossgen2 execution in CoreCLR tests is trivial with this background.

@lewing
Copy link
Member

lewing commented Feb 28, 2023

@trylek
Copy link
Member

trylek commented Feb 28, 2023

As I said, I'm not all that much familiar with this era of SDK development. To reinstate the variable, I guess it would be easiest to hack around

https://github.com/dotnet/sdk/blob/b68de63954610d17b82166de586567dd244a85ca/src/Cli/dotnet/Program.cs#L168

where we're already inspecting several environment variables. Adding an extra variable representing the "no-first-time-experience" and using it in an OR clause at line

https://github.com/dotnet/sdk/blob/b68de63954610d17b82166de586567dd244a85ca/src/Cli/dotnet/Program.cs#L180

would probably go a long way towards satisfying both behaviors. Having said that, I still maintain I'm not an SDK expert so I'd love to hear from at least one person familiar with the repo that my conclusions make some sense.

@trylek
Copy link
Member

trylek commented Feb 28, 2023

@marcpopMSFT - could you please comment on this or route this to someone familiar with the code in question?

@radekdoulik
Copy link
Member

Looks similar to NuGet/Home#12159

@akoeplinger
Copy link
Member Author

akoeplinger commented Mar 2, 2023

Hmm interesting, looks like DOTNET_SKIP_FIRST_TIME_EXPERIENCE was removed indeed: dotnet/sdk#9945.
Given it is referenced a lot in different places and we have the nuget migration thing during first run I'd say it might be good to bring it back.

@pavelsavara
Copy link
Member

Another exhibit on #82834 Log

@ericstj
Copy link
Member

ericstj commented Apr 6, 2023

Hit in #84355

@lewing
Copy link
Member

lewing commented Apr 11, 2023

hit in #84420 but pattern didn't match

@lewing
Copy link
Member

lewing commented Apr 15, 2023

Is there a plan for fixing the underlying issue here? A mutex that fails for compatibility reasons still is a very broken mutex.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 10, 2023
@kouvel
Copy link
Member

kouvel commented Aug 10, 2023

The current plan after discussion is to revert the previous changes that made session-local mutexes user-specific, and offer user-specific mutexes as a new feature in a future release.

kouvel added a commit that referenced this issue Aug 11, 2023
- A previous change that was serviced back changed session-local named mutexes to be user-specific by restricting the permissions of the session directories and files under them, and adding the sticky bit to some directoires. A compat issue arose from that change, as the session directories have the session ID in their name and session IDs can be reused between different users. The current plan that we have discussed is to revert the change and service back the revert, which also restores the intended behavior, and offer user-specific mutexes as a new feature in a future .NET that would satisfy some user scenarios in a better way.
- This PR reverts the previous change (first commit) and restores one change from the previous change (second commit) to improve backward compatibility due to differences in permissions for session directories before and after the change
- Fixes #80619
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 11, 2023
alex-mccarthy-unity added a commit to Unity-Technologies/ml-agents that referenced this issue Mar 14, 2024
The 8.x release should contain dotnet/runtime#90342 which fixes dotnet/runtime#80619.

I hope this will fix flaky failures like https://github.com/Unity-Technologies/ml-agents/actions/runs/8268945605/job/22623023348 of the form:
```
dotnet-format............................................................Failed
- hook id: dotnet-format
- exit code: 1

System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at NuGet.Common.Migrations.MigrationRunner.Run(String migrationsDirectory)
   at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
   at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
   at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
   at Microsoft.DotNet.Cli.Program.Main(String[] args)
```
miguelalonsojr pushed a commit to Unity-Technologies/ml-agents that referenced this issue Mar 14, 2024
The 8.x release should contain dotnet/runtime#90342 which fixes dotnet/runtime#80619.

I hope this will fix flaky failures like https://github.com/Unity-Technologies/ml-agents/actions/runs/8268945605/job/22623023348 of the form:
```
dotnet-format............................................................Failed
- hook id: dotnet-format
- exit code: 1

System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at NuGet.Common.Migrations.MigrationRunner.Run(String migrationsDirectory)
   at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
   at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
   at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
   at Microsoft.DotNet.Cli.Program.Main(String[] args)
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet