-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
volatile.
prefix doesn't work on ldobj
or stobj
#91530
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionI'm probably the first person insane enough to try this :) In IL code you can use the
The point of the above is that something like: struct A
{
int a, b;
} should work with:
when Reproduction Stepsstruct A
{
public int a, b;
}
static T VolatileRead<T>(ref T reference)
{
//ldarg.0
//volatile.
//ldobj !!0
//ret
}
static void VolatileWrite<T>(ref T reference, T value)
{
//ldarg.0
//ldarg.1
//volatile.
//stobj !!0
//ret
}
static A ReadVolatile1(ref A value) => VolatileRead(ref value);
static int ReadVolatile1(ref int value) => VolatileRead(ref value);
static A ReadVolatile2(ref A value)
{
A result;
result.a = Volatile.Read(ref value.a);
result.b = Volatile.Read(ref value.b);
return result;
}
static int ReadVolatile2(ref int value) => Volatile.Read(ref value);
static void WriteVolatile1(ref A value, A value2) => VolatileWrite(ref value, value2);
static void WriteVolatile1(ref int value, int value2) => VolatileWrite(ref value, value2);
static void WriteVolatile2(ref A value, A value2)
{
Volatile.Write(ref value2.a, value.a);
Volatile.Write(ref value2.b, value.b);
}
static void WriteVolatile2(ref int value, int value2) => Volatile.Write(ref value, value2); Expected behavior
Actual behaviorI get the following codegen from crossgen2: ; Assembly listing for method Program:ReadVolatile1(byref):A
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M15271_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M15271_IG02: ;; offset=0008H
9000000B adrp x11, [HIGH RELOC #0x40000000004292b8] // function address
9100016B add x11, x11, [LOW RELOC #0x40000000004292b8]
F9400161 ldr x1, [x11]
;; size=12 bbWeight=1 PerfScore 4.00
G_M15271_IG03: ;; offset=0014H
A8C17BFD ldp fp, lr, [sp],#0x10
D61F0020 br x1
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 28, prolog size 8, PerfScore 10.30, instruction count 7, allocated bytes for code 28 (MethodHash=d3c5c458) for method Program:ReadVolatile1(byref):A
; ============================================================
; Assembly listing for method Program:ReadVolatile2(byref):A
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 4 ) byref -> x0 single-def
; V01 loc0 [V01 ] ( 3, 3 ) struct ( 8) [fp+18H] do-not-enreg[S] ld-addr-op
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V04 tmp2 [V04 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V05 tmp3 [V05,T01] ( 2, 2 ) int -> [fp+18H] do-not-enreg[] V01.a(offs=0x00) P-DEP "field V01.a (fldOffset=0x0)"
; V06 tmp4 [V06,T02] ( 2, 2 ) int -> [fp+1CH] do-not-enreg[] V01.b(offs=0x04) P-DEP "field V01.b (fldOffset=0x4)"
;
; Lcl frame size = 16
G_M36164_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-0x20]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M36164_IG02: ;; offset=0008H
88DFFC01 ldar w1, [x0]
B9001BA1 str w1, [fp,#0x18] // [V05 tmp3]
B9400400 ldr w0, [x0,#0x04]
D50339BF dmb ishld
B9001FA0 str w0, [fp,#0x1C] // [V06 tmp4]
F9400FA0 ldr x0, [fp,#0x18] // [V01 loc0]
;; size=24 bbWeight=1 PerfScore 20.00
G_M36164_IG03: ;; offset=0020H
A8C27BFD ldp fp, lr, [sp],#0x20
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 40, prolog size 8, PerfScore 27.50, instruction count 10, allocated bytes for code 40 (MethodHash=85ab72bb) for method Program:ReadVolatile2(byref):A
; ============================================================
; Assembly listing for method Program:ReadVolatile1(byref):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M27221_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M27221_IG02: ;; offset=0008H
9000000B adrp x11, [HIGH RELOC #0x40000000004292e8] // function address
9100016B add x11, x11, [LOW RELOC #0x40000000004292e8]
F9400161 ldr x1, [x11]
;; size=12 bbWeight=1 PerfScore 4.00
G_M27221_IG03: ;; offset=0014H
A8C17BFD ldp fp, lr, [sp],#0x10
D61F0020 br x1
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 28, prolog size 8, PerfScore 10.30, instruction count 7, allocated bytes for code 28 (MethodHash=dc4895aa) for method Program:ReadVolatile1(byref):int
; ============================================================
; Assembly listing for method Program:ReadVolatile2(byref):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M42934_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M42934_IG02: ;; offset=0008H
88DFFC00 ldar w0, [x0]
;; size=4 bbWeight=1 PerfScore 3.00
G_M42934_IG03: ;; offset=000CH
A8C17BFD ldp fp, lr, [sp],#0x10
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 20, prolog size 8, PerfScore 8.50, instruction count 5, allocated bytes for code 20 (MethodHash=9e525849) for method Program:ReadVolatile2(byref):int
; ============================================================
; Assembly listing for method Program:WriteVolatile1(byref,A)
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
; V01 arg1 [V01,T01] ( 3, 3 ) struct ( 8) [fp+18H] do-not-enreg[S] single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 16
G_M5886_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-0x20]!
910003FD mov fp, sp
F9000FA1 str x1, [fp,#0x18]
;; size=12 bbWeight=1 PerfScore 2.50
G_M5886_IG02: ;; offset=000CH
F9400FA1 ldr x1, [fp,#0x18]
9000000B adrp x11, [HIGH RELOC #0x4000000000429358] // function address
9100016B add x11, x11, [LOW RELOC #0x4000000000429358]
F9400162 ldr x2, [x11]
;; size=16 bbWeight=1 PerfScore 6.00
G_M5886_IG03: ;; offset=001CH
A8C27BFD ldp fp, lr, [sp],#0x20
D61F0040 br x2
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 36, prolog size 12, PerfScore 14.10, instruction count 9, allocated bytes for code 36 (MethodHash=7866e901) for method Program:WriteVolatile1(byref,A)
; ============================================================
; Assembly listing for method Program:WriteVolatile2(byref,A)
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 4 ) byref -> x0 single-def
; V01 arg1 [V01 ] ( 4, 4 ) struct ( 8) [fp+18H] do-not-enreg[XS] addr-exposed ld-addr-op single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
; V03 tmp1 [V03,T01] ( 2, 4 ) int -> x1 "Inlining Arg"
; V04 tmp2 [V04,T02] ( 2, 4 ) int -> x0 "Inlining Arg"
;
; Lcl frame size = 16
G_M11869_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-0x20]!
910003FD mov fp, sp
F9000FA1 str x1, [fp,#0x18]
;; size=12 bbWeight=1 PerfScore 2.50
G_M11869_IG02: ;; offset=000CH
B9400001 ldr w1, [x0]
D5033BBF dmb ish
B9001BA1 str w1, [fp,#0x18]
B9400400 ldr w0, [x0,#0x04]
D5033BBF dmb ish
B9001FA0 str w0, [fp,#0x1C]
;; size=24 bbWeight=1 PerfScore 28.00
G_M11869_IG03: ;; offset=0024H
A8C27BFD ldp fp, lr, [sp],#0x20
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 44, prolog size 8, PerfScore 36.90, instruction count 11, allocated bytes for code 44 (MethodHash=597cd1a2) for method Program:WriteVolatile2(byref,A)
; ============================================================
; Assembly listing for method Program:WriteVolatile1(byref,int)
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> x1 single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M16972_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M16972_IG02: ;; offset=0008H
9000000B adrp x11, [HIGH RELOC #0x4000000000429388] // function address
9100016B add x11, x11, [LOW RELOC #0x4000000000429388]
F9400162 ldr x2, [x11]
;; size=12 bbWeight=1 PerfScore 4.00
G_M16972_IG03: ;; offset=0014H
A8C17BFD ldp fp, lr, [sp],#0x10
D61F0040 br x2
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 28, prolog size 8, PerfScore 10.30, instruction count 7, allocated bytes for code 28 (MethodHash=2db1bdb3) for method Program:WriteVolatile1(byref,int)
; ============================================================
; Assembly listing for method Program:WriteVolatile2(byref,int)
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) byref -> x0 single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> x1 single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M63279_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M63279_IG02: ;; offset=0008H
889FFC01 stlr w1, [x0]
;; size=4 bbWeight=1 PerfScore 1.00
G_M63279_IG03: ;; offset=000CH
A8C17BFD ldp fp, lr, [sp],#0x10
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=abe008d0) for method Program:WriteVolatile2(byref,int)
; ============================================================ as can be seen above, the Regression?Probably not :) Known WorkaroundsFor specific cases, it's obviously easy enough to work around if you have access to all the fields, or if it's simply a class type or ConfigurationI got the codegen using disasmo. It's for .NET 7 on arm64 using crossgen2. I haven't got a locally built .NET 8, so I haven't checked it, but I doubt it's much better :) (it would be good if someone else was able to test this in .NET 8). Other informationI haven't tested all of the other instructions, but it could easily be an issue specific to
|
It's probably doable with some efforts accross all runtimes, but since it's not exposed in C# (or any other .NET language) it's probably better to leave it as is and refer to this ECMA's update: https://github.com/dotnet/runtime/blob/main/docs/design/specs/Ecma-335-Augments.md#atomic-reads-and-writes |
Not sure what you're referring to here - I don't expect atomicity guarantees (except for the builtin primitive types & pointers obviously), the issue is that the volatile guarantees are not met. |
Also, I don't see from your codegen what's wrong - your *1 variants have calls which may or may not provide volatile guarantees - how did you know? |
Yes, I had the example on the discord earlier (am happy to post it here if you would like).
I must have missed that, do you know what function it's calling (all mine here are public static int Method1(void* address, int a) => VolatileRead(ref *(int*)address);
public static int Method1(void* address, short a) => *(int*)address;
public static int Method1(void* address, byte a) => System.Threading.Volatile.Read(ref *(int*)address); gives ; Assembly listing for method Program:Method1(ulong,int):int
G_M25106_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M25106_IG02: ;; offset=0008H
9000000B adrp x11, [HIGH RELOC #0x40000000004292a8] // function address
9100016B add x11, x11, [LOW RELOC #0x40000000004292a8]
F9400161 ldr x1, [x11]
;; size=12 bbWeight=1 PerfScore 4.00
G_M25106_IG03: ;; offset=0014H
A8C17BFD ldp fp, lr, [sp],#0x10
D61F0020 br x1
;; size=8 bbWeight=1 PerfScore 2.00
; Assembly listing for method Program:Method1(ulong,short):int
G_M46963_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M46963_IG02: ;; offset=0008H
B9400000 ldr w0, [x0]
;; size=4 bbWeight=1 PerfScore 3.00
G_M46963_IG03: ;; offset=000CH
A8C17BFD ldp fp, lr, [sp],#0x10
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Assembly listing for method Program:Method1(ulong,ubyte):int
G_M27038_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-0x10]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M27038_IG02: ;; offset=0008H
88DFFC00 ldar w0, [x0]
;; size=4 bbWeight=1 PerfScore 3.00
G_M27038_IG03: ;; offset=000CH
A8C17BFD ldp fp, lr, [sp],#0x10
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00 why is the one which calls |
Presumably, it's a cross-assembly inlining in crossgen limitation (I assume your VolatileRead is written in pure IL in a separate module). |
I think you're right, it seems to generate the correct code when I use 1 other thing to check, the expected thing for VolatileCopyBlock would be It does seem to be not right for this random type I came up with though: ; Method Program:Method1(ulong,int):System.ValueTuple`3[int,ubyte,long]
G_M62410_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-0x20]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M62410_IG02: ;; offset=0008H
3DC00010 ldr q16, [x0]
3D8007B0 str q16, [fp,#0x10]
F9400BA0 ldr x0, [fp,#0x10]
F9400FA1 ldr x1, [fp,#0x18]
;; size=16 bbWeight=1 PerfScore 8.00
G_M62410_IG03: ;; offset=0018H
A8C27BFD ldp fp, lr, [sp],#0x20
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 32
; Method Program:Method1(ulong,short):System.ValueTuple`3[int,ubyte,long]
G_M11179_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-0x20]!
910003FD mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M11179_IG02: ;; offset=0008H
3DC00010 ldr q16, [x0]
3D8007B0 str q16, [fp,#0x10]
F9400BA0 ldr x0, [fp,#0x10]
F9400FA1 ldr x1, [fp,#0x18]
;; size=16 bbWeight=1 PerfScore 8.00
G_M11179_IG03: ;; offset=0018H
A8C27BFD ldp fp, lr, [sp],#0x20
D65F03C0 ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 32
for public static (int, byte, long) Method1(void* address, int a) => VolatileRead(ref *((int, byte, long)*)address);
public static (int, byte, long) Method1(void* address, short a) => *((int, byte, long)*)address; (feel free to inform me that I'm wrong if I am) |
Just so it's clear to anyone reading, specifically what I would expect to happen with the volatile operations on non-primitives/pointers, or when it's unaligned (since volatile can also be combined with
Implementing the above (for cases which aren't already handled like primitive types & pointers) isn't too difficult, as you need only put the applicable half barrier before/after the standard read/write code - or use the applicable store/write instruction if there's only 1 of them (or the first for write, last for read) and there's a volatile version available of that operation. |
For Additional notes from when I investigated it last: also volatile. initblk fails when it gets inlined, and |
Description
I'm probably the first person insane enough to try this :)
In IL code you can use the
volatile.
prefix for any ofldind
,stind
,ldfld
,stfld
,ldobj
,stobj
,initblk
,cpblk
,ldsfld
, andstsfld
(III.2.6). Notably, nowhere does it place any form of restrictions on what the type can be for APIs likeldobj
andstobj
; in fact it even specifies behaviour for atomicity, which would not make sense if it wasn't supposed to work on non-builtin typesThey do not provide atomicity, other than that guaranteed by the specification of §I.12.6.6.
(I.12.6.7).I.12.6.6 Atomic reads and writes
specifiesA conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic (see §I.12.6.2) when all the write accesses to a location are the same size.
which would mean that for custom types which are structs with multiple fields, it should just do the applicable volatile operation for each field (in an unspecified order).The point of the above is that something like:
should work with:
when
A
orint
or any other type is the generic parameter!!0
; and forA
it should generate something similar to what it would as if I'd done it directly with each field instead.To be clear, I ran into this as a result of determining that this operation would actually be useful to something I was trying to implement lock-free.
Reproduction Steps
Expected behavior
ReadVolatile1(int&)
andReadVolatile2(int&)
have the same (or very similar) codegen, and similarly withA&
. And same with Write.Actual behavior
I get the following codegen from crossgen2
as can be seen above, the
1
variants don't appear to be volatile at all. But they should all have extremely similar codegen to the matching2
variants.Regression?
Probably not :)
Known Workarounds
For specific cases, it's obviously easy enough to work around if you have access to all the fields, or if it's simply a class type or
unmanaged
, but it's basically unsolvable in the general case since gc-refs could be anywhere in it, so you can't just loop over the memory.Configuration
I got the codegen using disasmo. It's for .NET 7 on arm64 using crossgen2. I haven't got a locally built .NET 8, so I haven't checked it, but I doubt it's much better :) (it would be good if someone else was able to test this in .NET 8).
I was planning to use this on a .NET 6 project I have, so much for that idea lol. It probably won't be fixed until at least 9 at this point would be my guess (it would be great if we could fix it on 6, 7, and 8 too though 😄).
Other information
I haven't tested all of the other instructions, but it could easily be an issue specific to
ldobj
andstobj
and the rest could be fine, this should be checked.The text was updated successfully, but these errors were encountered: