-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fix case where we illegally optimize away a memory barrier on ARM32 #91827
Conversation
The ARM32/ARM64 backends have an optimization where they optimize out the latter of two subsequent memory barriers if no memory store/load has been seen between them. This optimization should not be allowed to remove memory barriers when a call has been seen. Fix dotnet#91732
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThe ARM32/ARM64 backends have an optimization where they optimize out the latter of two subsequent memory barriers if no memory store/load has been seen between them. This optimization did not take calls into account, allowing the removal of a memory barrier even if a call (with potential memory operations) had happened since the last one. Fix #91732 The problem happened in Lines 293 to 300 in e0a8170
The JIT effectively turned the @@ -1,42 +1,42 @@
G_M50486_IG05: ;; offset=0x0042
ldr r6, [r4+0x94]
dmb 15
ldr r0, [r4+0x0C]
and r7, r0, r6
ldr r0, [r5+0x04]
cmp r7, r0
bhs SHORT G_M50486_IG07
movs r0, 72
mul r0, r7, r0
adds r0, 8
ldr r0, [r5+r0]
dmb 15
sub r8, r0, r6
cmp r8, 0
bne SHORT G_M50486_IG06
add r0, r4, 148
adds r1, r6, 1
mov r2, r6
movw r3, 0x221
movt r3, 0xf732
blx r3 // System.Threading.Interlocked:CompareExchange(byref,int,int):int
cmp r0, r6
bne SHORT G_M50486_IG05
movs r2, 72
mul r2, r7, r2
adds r2, 8
adds r2, r5, r2
add r0, r2, 8
add r1, sp, 40
movs r2, 64
movw r12, 0x565f
movt r12, 0xf749
blx r12 // CORINFO_HELP_MEMCPY
movs r0, 72
mul r0, r7, r0
adds r0, 8
adds r3, r6, 1
+dmb 15
str r3, [r5+r0]
movs r0, 1
b SHORT G_M50486_IG03
|
@@ -8908,6 +8908,7 @@ void emitter::emitIns_Call(EmitCallType callType, | |||
|
|||
dispIns(id); | |||
appendToCurIG(id); | |||
emitLastMemBarrier = nullptr; // Cannot optimize away future memory barriers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this entire peephole should be switched to use the new backwards traversal peephole mechanism, which I expect would have quite significant TP improvements as we currently have code in very hot appendToCurIG
to handle this peephole. I've opened #91825 to track that. However, given that this PR should be backported I've kept the fix here surgical.
Failure is #91838 |
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6144973534 |
The ARM32/ARM64 backends have an optimization where they optimize out the latter of two subsequent memory barriers if no memory store/load has been seen between them. This optimization did not take calls into account, allowing the removal of a memory barrier even if a call (with potential memory operations) had happened since the last one.
Fix #91732
The problem happened in
ConcurrentQueueSegment<LargeStruct>.TryEnqueue
in this code:runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Concurrent/ConcurrentQueueSegment.cs
Lines 293 to 300 in e0a8170
The block copy was turned into a helper call to
CORINFO_HELP_MEMCPY
. Then the JIT effectively turned theVolatile.Write
into a plain write without any memory barrier. The diff with this PR is: