-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HostModel: Retry ResourceUpdate on Win32 error #32347
Conversation
2f02216
to
fcdd338
Compare
src/installer/managed/Microsoft.NET.HostModel/AppHost/RetryUtil.cs
Outdated
Show resolved
Hide resolved
This change attempts to fix a non-deterministic customer reported failure. Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running. This indicates interference by other software while the HostWriter is updating the AppHost. The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files. We believe that the correct fix for this issue is to complete: To implement dotnet#3828 and dotnet#3829 Ship AppHost with an extension/permissions not indicating an executable. However the above is a fairly large change for servicing .net core 3.1. So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error, unless the failure is a knwon irrecoverable code (listed a few error codes relevent to File IO). This change may cause unnecessary retries on legitimate failures (about 50 seconds). But such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update. Fixes dotnet#3832
fcdd338
to
fa4bd3a
Compare
dotnet/runtime#3832 Building a WinExe with resources fails non-deterministically Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running. This indicates interference by other software while the HostWriter is updating the AppHost. The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files. We believe that the correct fix for this issue is to complete: * To implement dotnet#3828 and dotnet#3829 * Ship AppHost with an extension/permissions not indicating an executable. However the above is a fairly large change for servicing .net core 3.1. So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update. Low dotnet/runtime#32347
dotnet/runtime#3832 Building a WinExe with resources fails non-deterministically Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running. This indicates interference by other software while the HostWriter is updating the AppHost. The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files. We believe that the correct fix for this issue is to complete: * To implement dotnet#3828 and dotnet#3829 * Ship AppHost with an extension/permissions not indicating an executable. However the above is a fairly large change for servicing .net core 3.1. So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update. Low dotnet/runtime#32347
case 0x00000001: // ERROR_INVALID_FUNCTION | ||
case 0x00000002: // ERROR_FILE_NOT_FOUND | ||
case 0x00000003: // ERROR_PATH_NOT_FOUND | ||
case 0x00000006: // ERROR_INVALID_HANDLE | ||
case 0x00000008: // ERROR_NOT_ENOUGH_MEMORY | ||
case 0x0000000B: // ERROR_BAD_FORMAT | ||
case 0x0000000E: // ERROR_OUTOFMEMORY | ||
case 0x0000000F: // ERROR_INVALID_DRIVE | ||
case 0x00000012: // ERROR_NO_MORE_FILES | ||
case 0x00000035: // ERROR_BAD_NETPATH | ||
case 0x00000057: // ERROR_INVALID_PARAMETER | ||
case 0x00000071: // ERROR_NO_MORE_SEARCH_HANDLES | ||
case 0x00000072: // ERROR_INVALID_TARGET_HANDLE | ||
case 0x00000078: // ERROR_CALL_NOT_IMPLEMENTED | ||
case 0x0000007B: // ERROR_INVALID_NAME | ||
case 0x0000007C: // ERROR_INVALID_LEVEL | ||
case 0x0000007D: // ERROR_NO_VOLUME_LABEL | ||
case 0x0000009A: // ERROR_LABEL_TOO_LONG | ||
case 0x000000A0: // ERROR_BAD_ARGUMENTS | ||
case 0x000000A1: // ERROR_BAD_PATHNAME | ||
case 0x000000CE: // ERROR_FILENAME_EXCED_RANGE | ||
case 0x000000DF: // ERROR_FILE_TOO_LARGE | ||
case 0x000003ED: // ERROR_UNRECOGNIZED_VOLUME | ||
case 0x000003EE: // ERROR_FILE_INVALID | ||
case 0x00000651: // ERROR_DEVICE_REMOVED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some insight on how you came up with this list of codes, and whether or not you anticipate more codes to be added upon further review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote down a list of error code by inspecting the windows error codes, scanning for errors that cannot succeed on retry. The list is conservative; that is, there could be more failures that shouldn't be retried. But I do not anticipate adding more codes to this list. I believe tripping into one of these cases is rare, since the file to be accessed is created by the HostModel library itself.
dotnet/runtime#3832 Building a WinExe with resources fails non-deterministically Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running. This indicates interference by other software while the HostWriter is updating the AppHost. The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files. We believe that the correct fix for this issue is to complete: * To implement #3828 and #3829 * Ship AppHost with an extension/permissions not indicating an executable. However the above is a fairly large change for servicing .net core 3.1. So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update. Low dotnet/runtime#32347
When does this propagate to the wild? I have updated all the NuGet packages and have latest Visual Studio 2019 and it happens to me till I clean and sometimes re-clean and sometimes restart VS, and we also rely on Azure DevOps Pipelines (we haven't switched to .NET Core 3.x in the pipeline but won't till this is fixed). |
@Anipik when is this release expected to be available? Thanks. |
Is there a beta patch available? Disabling virus protection on our on-prem Azure DevOps Agent workspace folder did not help. |
This change attempts to fix a non-deterministic customer reported failure.
Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.
The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files.
We believe that the correct fix for this issue is to complete:
However the above is a fairly large change for servicing .net core 3.1.
So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.
Fixes #3832