-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite spin lock in Encoding.GetEncoding #33383
Comments
I couldn't add an area label to this Issue. Checkout this page to find out which area owner to ping, or please add exactly one area label to help train me in the future. |
/cc @kouvel (not sure if the spinning in SpinLock is intended to go this high) |
I cannot really confirm it yet but I am starting to suspect that thread 22080 actually got past the We have a machine where this issue seems to be triggered couple of time a day but not reliably. It didn't trigger by simply running Update: The theory about thread 22080 is probably bogus. The disassembly shows the following:
which comes from this C# code line:
Hence, it didn't get past the |
|
No thread aborts in the whole app, .NET Core doesn't have any way to trigger them explicitly anyway. No re-entrancy as far as I can tell. The only |
As far as the re-entrancy goes I cannot completely rule it out on the UI thread. It's not happening at the particular moment the dump is captured and there are no references to the |
At the moment I don't see how it would get into that state due to reentrancy, as the spin-lock is released before any wait operation that could result in reentrancy.
Since it appears to be happening fairly frequently, an option might be to add some logging to enter/exit on the spin-waits logging the managed thread ID and look for a failure to find a missing exit. It looks like the implementation hasn't changed significantly since 2.1 based on history, is the scenario new, or has there been enough coverage on 2.x to indicate that this is a new problem in 3.x? |
We migrated our application from .NET Framework where the I can try to add some custom logging but it will require me to build custom runtime so that will definitely take me couple of days (especially for building 3.1 from source). |
I think logging would at least provide more info (provided a repro is found with logs), as I don't see a way to make progress with the current info. What I have done before in a somewhat similar case is to sync to the exact commit in the dotnet/coreclr repo that matches the commit hash from coreclr.dll from the runtime being used, and then (using an xcopy-able runtime) update only System.Private.CoreLib.dll with additional logging. If using |
I'm thinking about changing Thanks for the idea with just replacing CoreLib. I've messed that up before because I need to match correct build configuration to the runtime (release/debug/checked) since they are not interchangeable. Hopefully I won't mess it this time. |
That would hopefully point to the thread but by the time of failure it would be too late and the thread's stack would be long past the issue. Definitely would be more info, it would be interesting to see stack traces for the enter without a matching exit, although that may make it more difficult to repro if it's a timing issue. I'm sure there would be ways to progress from there, like adding try/finally's to track when there is be a missing exit, and to log the stack trace when that happens. |
Or maybe try/catch-all+rethrow/finallys to ensure that the spin-lock is not held by the by current thread, by the end of any entry point, with logs in the catch-all and finally. Hopefully there should be no other way for the logs to not find a more actionable cause for the issue. |
We were unable to reproduce the issue again, even on the same machine. Going to close it for now. Thanks for help and ideas! |
We've encountered weird deadlock/live lock on .NET Core 3.1 and captured a memory dump of it. The dump may contain sensitive information and hence I am not comfortable sharing it publicly but I will be happy to share it privately or dump specific information from it.
Three threads are racing on the same spin lock which is marked as locked but there doesn't seem to be any thread actually locking it.
Stack traces for the thread look like this:
Thread 18060:
looping with
spinIndex = 556840848
.Thread 22080:
looping with
spinIndex = 0
.Thread 23232:
looping with
spinIndex = 552403664
.EncodingTable.s_cacheLock
object looks like this:It seems that the lock object is not held by anything but it seems to spin three threads anyway.
For completeness, this is how the
SpinLock
object looks:The text was updated successfully, but these errors were encountered: