-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/7.0] [Mono] Race in init_method when using LLVM AOT. #93006
Merged
lambdageek
merged 4 commits into
release/7.0-staging
from
backport/pr-75088-to-release/7.0
Oct 6, 2023
Merged
[release/7.0] [Mono] Race in init_method when using LLVM AOT. #93006
lambdageek
merged 4 commits into
release/7.0-staging
from
backport/pr-75088-to-release/7.0
Oct 6, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When using LLVM AOT codegen, init_method updates two GOT slots. These slots are initialized as part of init_method, but there is a race between initialization of the two slots. Current implementation can have two threads running init_method for the same method, but as soon as: [got_slots [pindex]] = addr store is visible, it will trigger other threads to return back from init_method, but since that could happen before the corresponding LLVM AOT const slot is set, second thread will return to method calling init_method, load the LLVM aot const, and crash when trying to use it (since its still NULL). This crash is very rare but have been identified on x86/x64 CPU's, when one thread is either preempted between updating regular GOT slot and LLVM GOT slot or store into LLVM GOT slot gets delayed in store buffer. I have also been able to emulate the scenario in debugger, triggering the issue and crashing in the method loading from LLVM aot const slot. Fix change order of updates and make sure the update of LLVM aot const slot happens before memory_barrier, since: got [got_slots [pindex]] = addr; have release semantics in relation to addr and update of LLVM aot const slot. Fix also add acquire/release semantics for ji->type in init_method since it is used to guard if a thread ignores a patch or not and it should not be re-ordered with previous stores, since it can cause similar race conditions with updated slots.
github-actions
bot
requested review from
vargaz,
lambdageek and
SamMonoRT
as code owners
October 4, 2023 13:00
vargaz
approved these changes
Oct 4, 2023
lambdageek
added
Servicing-approved
Approved for servicing release
and removed
Servicing-consider
Issue for next servicing release review
labels
Oct 4, 2023
Approved by tactics in email |
Ah crud, I forgot that the backports should be aiming at the |
This was referenced Oct 5, 2023
lateralusX
approved these changes
Oct 6, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Failures don't seem related |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #75088 to release/7.0
Fixes #81211
/cc @lambdageek @lateralusX
Customer Impact
Customers targeting Apple platforms using LLVM AOT codegen (the default) in highly concurrent settings (such as firing off multiple simultaneous async HTTP requests) may experience unexpected behavior such as InvalidCastExceptions, NullReferenceExceptions or crashes.
Testing
Manual testing
Risk
Low. This code has been running on .NET 8
main
for over a year in CI, as well as on some other non-mobile platforms