Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

gdkchan · 2019-12-21T22:05:15Z

I have some code that basically reads and writes to a given memory region, with a offset. I had one for each integer type, and also a generic one for structs etc.

I decided, with the use of spans, check if I could remove all the integer read/write methods and just use the generic method instead. So instead of:

int x = ReadInt32(offset);

I could just do:

int x = Read<int>(offset);

After checking the code generated for both, I noticed that the one generated for the generic variant was almost as good as the specialized one, with the exception of an odd issue, the bounds check was not fully eliminated.

Here's some test code that highlights the issue:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

public class C
{
    private IntPtr _ptr;
    
    public C(IntPtr ptr)
    {
        _ptr = ptr;
    }
    
    public int ReadInt32(int offset)
    {
        return Read<int>(offset);
    }
    
    public unsafe int ReadInt32Fast(int offset)
    {
        return *(int*)(_ptr + offset);
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public T Read<T>(int offset) where T : struct
    {
        return MemoryMarshal.Cast<byte, T>(GetDataSpan(offset, Unsafe.SizeOf<T>()))[0];
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private unsafe Span<byte> GetDataSpan(int offset, int size)
    {
        return new Span<byte>((void*)(_ptr + offset), size);
    }
}

SharpLab link here.
Code generated for the ReadInt32 method:

C.ReadInt32(Int32)
    L0000: sub rsp, 0x28
    L0004: mov rax, [rcx+0x8]
    L0008: movsxd rdx, edx
    L000b: add rax, rdx
    L000e: mov edx, 0x1
    L0013: cmp edx, 0x0
    L0016: jbe L0025
    L0018: mov eax, [rax]
    L001a: add rsp, 0x28
    L001e: ret
    L001f: call 0x7fff231aed50
    L0024: int3
    L0025: call 0x7fff231aef00
    L002a: int3

This is the odd part:

    L000e: mov edx, 0x1
    L0013: cmp edx, 0x0
    L0016: jbe L0025

It's doing a comparison with constant values. As far I can tell, 1 is the length and 0 the index. Since they are constant, constant folding should evaluate 1U <= 0U == false, and so we know that the branch is never going to be taken, since the condition is always false. So the comparison and jump should be eliminated, and then the other part of the code would become unreachable. It looks like L001f is already unreachable aswell, so I don't know why the code is still there. It looks like that whichever optimization pass is responsible for optimizing length checks runs too late on the compiler pipeline, and so the other optimizations are not kicking in.

The goal is generating the same or as efficient code for ReadInt32 and ReadInt32Fast.

category:cq
theme:value-numbering
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

AndyAyersMS · 2019-12-28T02:27:14Z

Naively unblocking the constant eval VNF_CastOvf paths in value numbering fixes this (a real fix would need to be more careful about handling overflow). The issue is that we have a dead overflow cast but don't optimize that path away until too late.

G_M5540_IG01:
       sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M5540_IG02:
       mov      rax, qword ptr [rcx+8]
       movsxd   rdx, edx
       add      rax, rdx
       mov      eax, dword ptr [rax]
						;; bbWeight=1    PerfScore 4.50
G_M5540_IG03:
       add      rsp, 40
       ret      
						;; bbWeight=1    PerfScore 1.25
G_M5540_IG04:
       call     CORINFO_HELP_OVERFLOW
       int3

So if value numbering handled VNF_CastOvf with constant args (or if perhaps we did sparse conditional evaluation to avoid getting confused by at dead stuff), we would be able to simplify this.

Also would be nice to get rid of the dead overflow call.

…1115) Should match the logic in mlinfo.cpp.

jakobbotsch · 2022-12-16T09:44:49Z

Codegen today:

; Assembly listing for method C:ReadInt32(int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 8 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
;  V00 this         [V00,T00] (  3,  3   )     ref  ->  rcx         this class-hnd single-def
;  V01 arg1         [V01,T01] (  3,  3   )     int  ->  rdx         single-def
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V03 tmp1         [V03    ] (  0,  0   )  struct (16) zero-ref    "struct address for call/obj"
;* V04 tmp2         [V04    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inline stloc first use temp"
;* V05 tmp3         [V05    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp"
;  V06 tmp4         [V06,T03] (  2,  4   )    long  ->  rax         "Inlining Arg"
;* V07 tmp5         [V07,T06] (  0,  0   )     int  ->  zero-ref    "Inline stloc first use temp"
;* V08 tmp6         [V08,T07] (  0,  0   )     int  ->  zero-ref    "Inline stloc first use temp"
;* V09 tmp7         [V09    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;* V10 tmp8         [V10,T08] (  0,  0   )     int  ->  zero-ref    "Inline stloc first use temp"
;* V11 tmp9         [V11    ] (  0,  0   )    long  ->  zero-ref    "Inline stloc first use temp"
;* V12 tmp10        [V12,T09] (  0,  0   )     int  ->  zero-ref    "Inline stloc first use temp"
;* V13 tmp11        [V13    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp"
;  V14 tmp12        [V14,T02] (  2,  4   )   byref  ->  rax         single-def "Inlining Arg"
;* V15 tmp13        [V15    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V16 tmp14        [V16    ] (  0,  0   )    bool  ->  zero-ref    "Inlining Arg"
;* V17 tmp15        [V17    ] (  0,  0   )   byref  ->  zero-ref    V03._reference(offs=0x00) P-INDEP "field V03._reference (fldOffset=0x0)"
;* V18 tmp16        [V18    ] (  0,  0   )     int  ->  zero-ref    V03._length(offs=0x08) P-INDEP "field V03._length (fldOffset=0x8)"
;* V19 tmp17        [V19    ] (  0,  0   )   byref  ->  zero-ref    V04._reference(offs=0x00) P-INDEP "field V04._reference (fldOffset=0x0)"
;* V20 tmp18        [V20    ] (  0,  0   )     int  ->  zero-ref    V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;  V21 tmp19        [V21,T04] (  2,  2   )   byref  ->  rax         V05._reference(offs=0x00) P-INDEP "field V05._reference (fldOffset=0x0)"
;* V22 tmp20        [V22    ] (  0,  0   )     int  ->  zero-ref    V05._length(offs=0x08) P-INDEP "field V05._length (fldOffset=0x8)"
;  V23 tmp21        [V23,T05] (  2,  2   )   byref  ->  rax         single-def V09._reference(offs=0x00) P-INDEP "field V09._reference (fldOffset=0x0)"
;* V24 tmp22        [V24    ] (  0,  0   )     int  ->  zero-ref    V09._length(offs=0x08) P-INDEP "field V09._length (fldOffset=0x8)"
;* V25 tmp23        [V25    ] (  0,  0   )   byref  ->  zero-ref    V13._reference(offs=0x00) P-INDEP "field V13._reference (fldOffset=0x0)"
;* V26 tmp24        [V26    ] (  0,  0   )     int  ->  zero-ref    V13._length(offs=0x08) P-INDEP "field V13._length (fldOffset=0x8)"
;
; Lcl frame size = 40

G_M5476_IG01:
       sub      rsp, 40
						;; size=4 bbWeight=1    PerfScore 0.25
G_M5476_IG02:
       movsxd   rax, edx
       add      rax, qword ptr [rcx+08H]
       mov      eax, dword ptr [rax]
						;; size=9 bbWeight=1    PerfScore 5.25
G_M5476_IG03:
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1    PerfScore 1.25
G_M5476_IG04:
       call     CORINFO_HELP_OVERFLOW
       int3     
						;; size=6 bbWeight=0    PerfScore 0.00

; Total bytes of code 24, prolog size 4, PerfScore 9.15, instruction count 8, allocated bytes for code 24 (MethodHash=5011ea9b) for method C:ReadInt32(int):int:this
; ============================================================

Looks fixed, although we still leave the dead throw helper around.

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 21, 2019

jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 22, 2019

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Dec 28, 2019

AndyAyersMS added this to the Future milestone Dec 28, 2019

AndyAyersMS mentioned this issue Jul 1, 2020

JIT doesn't properly optimize assignment to stackalloced span elements when used as 'out' parameters #38628

Open

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020

SingleAccretion mentioned this issue Mar 12, 2021

Redundant mov instructions generated #49535

Closed

SingleAccretion mentioned this issue Mar 30, 2021

Handle casts done via helpers and fold overflow operations in value numbering #50450

Merged

MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this issue Jul 1, 2021

Don't assume everything except AsAny is IUnknown marshalling (dotnet#…

1424eba

…1115) Should match the logic in mlinfo.cpp.

jakobbotsch closed this as completed Dec 16, 2022

ghost locked as resolved and limited conversation to collaborators Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

gdkchan commented Dec 21, 2019 •

edited by BruceForstall

Loading

AndyAyersMS commented Dec 28, 2019 •

edited by BruceForstall

Loading

jakobbotsch commented Dec 16, 2022

Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

Span bounds checking not fully eliminated in some cases when index is constant and length is known #1115

Comments

gdkchan commented Dec 21, 2019 • edited by BruceForstall Loading

AndyAyersMS commented Dec 28, 2019 • edited by BruceForstall Loading

jakobbotsch commented Dec 16, 2022

gdkchan commented Dec 21, 2019 •

edited by BruceForstall

Loading

AndyAyersMS commented Dec 28, 2019 •

edited by BruceForstall

Loading