-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix class construction cycle in Lock on NativeAOT #100374
Conversation
When a thread reenters class construction through accessing NativeRuntimeEventSource.Log, Log would return null. Checks on IsFullyInitialized were added to ensure that the normal path would not be taken in that case, to avoid null checks in several places and in different files. That doesn't work when a different thread sees the initialization stage as Complete, as it would try to initialize NativeRuntimeEventSource and run into a class construction cycle. Fixed by removing the IsFullyInitialized checks, introducing a new initialization stage PartiallyCompelte, and not setting the stage to Complete until it has been verified that Log does not return null. When the stage is PartiallyCompelte, a thread would retry the relevant initialization. This again guarantees that there would be at most one attempt at initialization through Lock at any given time, and prevents the class construction cycle. Fixes dotnet#99663
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
Just trying to fix the CI issue for now. I'm considering a different option, to have the finalizer thread initialize what is necessary for Lock in the background (not relying on a finalizable object), as this initialization is likely to run during startup anyway. The initialization scheme may change later. |
@VSadov, would you be able to take a look? I think the alternative I suggested above would simplify some things but it needs more testing, I'd like to get the CI issue fixed first. |
CC @mangod9 |
s_minSpinCount = DetermineMinSpinCount(); | ||
if (oldStage == StaticsInitializationStage.NotStarted) | ||
{ | ||
// If the stage is PartiallyComplete, these will have already been initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if they have already been initialized, why is it initializing them again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are initialized when the stage is changed from NotStarted. The stage doesn't go back to NotStarted once changed, so they would only be initialized once.
This is the third time we are trying to fix a reentrancy issue in the Lock. Perhaps fourth, if counting the same issue in the PR that introduced the Lock. As long as the Lock`s initialization should be trivial. Anything that is nontrivial should be optional and be done "out of line". |
{ | ||
// If the stage is PartiallyComplete, these will have already been initialized | ||
s_isSingleProcessor = Environment.IsSingleProcessor; | ||
s_maxSpinCount = DetermineMaxSpinCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if DetermineMaxSpinCount()
takes a lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would attempt to take the lock and may have to spin-wait to acquire it. Not sure what you're actually asking though, can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a possibility that DetermineMaxSpinCount
somehow directly or indirectly takes a lock and end up getting here.
But we are not holding anything so we are not introducing a deadlock just by reentering, so the recursive locking like this should work. Just hard to think of all the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the idea is that there is only one attempt at initialization at any given time, so this path would not reenter even in the same thread.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Lock.NativeAot.cs
Show resolved
Hide resolved
We're not seeing new fundamental issues yet, the issue being fixed here is just an unfortunate bug that was introduced in a previous change of mine.
The lock does not behave as intended until it is initialized, I don't consider the initialization optional. This is a NativeAOT-specific issue and there it seems nontrivial to do some initialization that should be trivial. I agree that the initialization is better done out-of-line, as there are other potentials for issues that could be avoided. The initialization needs to be done in a timely manner, it's ok for it to be done in a background thread (eg. finalizer). Relying on a finalizable object does not offer timeliness. It also seems nontrivial to run managed initialization on the finalizer thread on startup. I think simplest would be to use the I'd be happy to make a change to accommodate any of the above with any relevant details that may be necessary to make it happen. |
Currently the "initialization" is ensuring that logging is enabled and checking for overridden spin defaults. The lock can operate without this. Especially if it is not for long.
Yes. Finalizer gets around that and also moves initialization from startup to the later time. |
It operates, sure, but not as intended. I don't see a good reason to change the intention of behavior because of some artificial limitation. |
All these "unintended" states will be in practice short enough to not matter in regular case. Forcing the lock into a spinlock mode while trying to initialize is not the intended behavior either. We just expect that it will resolve itself quickly. It adds many additional states though and it is hard to ensure all possible situations. Like - I am not sure we can do waits with timeouts in such mode, but we could be still ok if the duration of such mode is bounded. But if the boundedness of the initialization may somehow depend on not taking a lock with a timeout, we might still see some circularity. I can't think of an exact scenario though. Anyways. I think the current fix will address the issue that it tries to address. At least I do not see how it could fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Currently, the wait path is avoided until initialization completes. This is more a safety net than anything else - new issues can arise on the wait path and it would be difficult to solve those without that safety net. I think there are multiple good reasons to have a way in NativeAOT to do some managed initialization in the background. For instance, if we find that initialization of NativeRuntimeEventSource is too expensive, some of that could be delegated to the background. |
Background initialization of |
FWIW, I do not think that it would work. EventSource has to be initialized eagerly, otherwise the events fired early during startup would be lost. |
Fair enough. In that case, it shouldn't cost much more to initialize NativeRuntimeEventSource. |
At the very least, it should be cheap to initialize NativeRuntimeEventSource, such that IsEnabled returns false until it is fully initialized, perhaps lazily in some fasion. |
These events are lock contention events. They typically do not happen until late enough that event source is initialized by someone else. That is one reason why we see these reentrancy failures only rarely. I think this is something that we could trade for reliability. In fact we already do. What we have right now already has the property of allowing events to be lost. We null-check the event source and if not yet set, we do not post events. I understand the desire to have all the events, but it seems like something we could trade for reliability, when the loss is very rare. |
It would be possible for many apps to never need to send a contention event at all or even to use the slow path of the lock. I think |
On my x64 Windows machine the initialization appears to be taking about 2.3 us in total, of which about 0.2 us for initializing NativeRuntimeEventSource, so the env var lookups appear to be the more expensive parts. Also looks like the initialization is not triggered in simple console apps. Once the initialization starts though, it seems it would complete fairly quickly, so there probably wouldn't be much lost in contention events or in hiding contentions due to waiting for the initialization. So I'm thinking of going with this for now. |
When a thread reenters class construction through accessing NativeRuntimeEventSource.Log, Log would return null. Checks on IsFullyInitialized were added to ensure that the normal path would not be taken in that case, to avoid null checks in several places and in different files. That doesn't work when a different thread sees the initialization stage as Complete, as it would try to initialize NativeRuntimeEventSource and run into a class construction cycle. Fixed by removing the IsFullyInitialized checks, introducing a new initialization stage PartiallyCompelte, and not setting the stage to Complete until it has been verified that Log does not return null. When the stage is PartiallyCompelte, a thread would retry the relevant initialization. This again guarantees that there would be at most one attempt at initialization through Lock at any given time, and prevents the class construction cycle. Fixes dotnet#99663
When a thread reenters class construction through accessing NativeRuntimeEventSource.Log, Log would return null. Checks on IsFullyInitialized were added to ensure that the normal path would not be taken in that case, to avoid null checks in several places and in different files. That doesn't work when a different thread sees the initialization stage as Complete, as it would try to initialize NativeRuntimeEventSource and run into a class construction cycle. Fixed by removing the IsFullyInitialized checks, introducing a new initialization stage PartiallyCompelte, and not setting the stage to Complete until it has been verified that Log does not return null. When the stage is PartiallyCompelte, a thread would retry the relevant initialization. This again guarantees that there would be at most one attempt at initialization through Lock at any given time, and prevents the class construction cycle. Fixes dotnet#99663
When a thread reenters class construction through accessing NativeRuntimeEventSource.Log, Log would return null. Checks on IsFullyInitialized were added to ensure that the normal path would not be taken in that case, to avoid null checks in several places and in different files. That doesn't work when a different thread sees the initialization stage as Complete, as it would try to initialize NativeRuntimeEventSource and run into a class construction cycle.
Fixed by removing the IsFullyInitialized checks, introducing a new initialization stage PartiallyCompelte, and not setting the stage to Complete until it has been verified that Log does not return null. When the stage is PartiallyCompelte, a thread would retry the relevant initialization. This again guarantees that there would be at most one attempt at initialization through Lock at any given time, and prevents the class construction cycle.
Fixes #99663