-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115
Comments
Naively unblocking the constant eval G_M5540_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M5540_IG02:
mov rax, qword ptr [rcx+8]
movsxd rdx, edx
add rax, rdx
mov eax, dword ptr [rax]
;; bbWeight=1 PerfScore 4.50
G_M5540_IG03:
add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25
G_M5540_IG04:
call CORINFO_HELP_OVERFLOW
int3 So if value numbering handled Also would be nice to get rid of the dead overflow call. |
…1115) Should match the logic in mlinfo.cpp.
Codegen today: ; Assembly listing for method C:ReadInt32(int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 8 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 3, 3 ) ref -> rcx this class-hnd single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
;* V04 tmp2 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inline stloc first use temp"
;* V05 tmp3 [V05 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp"
; V06 tmp4 [V06,T03] ( 2, 4 ) long -> rax "Inlining Arg"
;* V07 tmp5 [V07,T06] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V08 tmp6 [V08,T07] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V09 tmp7 [V09 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V10 tmp8 [V10,T08] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V11 tmp9 [V11 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp"
;* V12 tmp10 [V12,T09] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V13 tmp11 [V13 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "NewObj constructor temp"
; V14 tmp12 [V14,T02] ( 2, 4 ) byref -> rax single-def "Inlining Arg"
;* V15 tmp13 [V15 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V16 tmp14 [V16 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V17 tmp15 [V17 ] ( 0, 0 ) byref -> zero-ref V03._reference(offs=0x00) P-INDEP "field V03._reference (fldOffset=0x0)"
;* V18 tmp16 [V18 ] ( 0, 0 ) int -> zero-ref V03._length(offs=0x08) P-INDEP "field V03._length (fldOffset=0x8)"
;* V19 tmp17 [V19 ] ( 0, 0 ) byref -> zero-ref V04._reference(offs=0x00) P-INDEP "field V04._reference (fldOffset=0x0)"
;* V20 tmp18 [V20 ] ( 0, 0 ) int -> zero-ref V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
; V21 tmp19 [V21,T04] ( 2, 2 ) byref -> rax V05._reference(offs=0x00) P-INDEP "field V05._reference (fldOffset=0x0)"
;* V22 tmp20 [V22 ] ( 0, 0 ) int -> zero-ref V05._length(offs=0x08) P-INDEP "field V05._length (fldOffset=0x8)"
; V23 tmp21 [V23,T05] ( 2, 2 ) byref -> rax single-def V09._reference(offs=0x00) P-INDEP "field V09._reference (fldOffset=0x0)"
;* V24 tmp22 [V24 ] ( 0, 0 ) int -> zero-ref V09._length(offs=0x08) P-INDEP "field V09._length (fldOffset=0x8)"
;* V25 tmp23 [V25 ] ( 0, 0 ) byref -> zero-ref V13._reference(offs=0x00) P-INDEP "field V13._reference (fldOffset=0x0)"
;* V26 tmp24 [V26 ] ( 0, 0 ) int -> zero-ref V13._length(offs=0x08) P-INDEP "field V13._length (fldOffset=0x8)"
;
; Lcl frame size = 40
G_M5476_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M5476_IG02:
movsxd rax, edx
add rax, qword ptr [rcx+08H]
mov eax, dword ptr [rax]
;; size=9 bbWeight=1 PerfScore 5.25
G_M5476_IG03:
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
G_M5476_IG04:
call CORINFO_HELP_OVERFLOW
int3
;; size=6 bbWeight=0 PerfScore 0.00
; Total bytes of code 24, prolog size 4, PerfScore 9.15, instruction count 8, allocated bytes for code 24 (MethodHash=5011ea9b) for method C:ReadInt32(int):int:this
; ============================================================ Looks fixed, although we still leave the dead throw helper around. |
I have some code that basically reads and writes to a given memory region, with a offset. I had one for each integer type, and also a generic one for structs etc.
I decided, with the use of spans, check if I could remove all the integer read/write methods and just use the generic method instead. So instead of:
I could just do:
After checking the code generated for both, I noticed that the one generated for the generic variant was almost as good as the specialized one, with the exception of an odd issue, the bounds check was not fully eliminated.
Here's some test code that highlights the issue:
SharpLab link here.
Code generated for the ReadInt32 method:
This is the odd part:
It's doing a comparison with constant values. As far I can tell, 1 is the length and 0 the index. Since they are constant, constant folding should evaluate 1U <= 0U == false, and so we know that the branch is never going to be taken, since the condition is always false. So the comparison and jump should be eliminated, and then the other part of the code would become unreachable. It looks like
L001f
is already unreachable aswell, so I don't know why the code is still there. It looks like that whichever optimization pass is responsible for optimizing length checks runs too late on the compiler pipeline, and so the other optimizations are not kicking in.The goal is generating the same or as efficient code for ReadInt32 and ReadInt32Fast.
category:cq
theme:value-numbering
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: