Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal Array.Clear(Array), optimize some Array callers + codegen #51548

Merged
merged 1 commit into from
Apr 21, 2021

Conversation

GrabYourPitchforks
Copy link
Member

Optimized some calls to Array.Clear, including adding an internal non-generic Array.Clear(Array) method for common use cases, plus redirecting some existing calls to Array.Clear or Span<T>.Clear (see #51534) or Span<T>.Fill.

This is marked as draft because I'm soliciting feedback on whether to pursue this. It introduces some complexity to the RawArrayData and RawData types (using FieldOffset) in order to knock down the codegen. Whether complicating the C# definitions of these types in order to save some bytes of codegen is worth it I'll leave up to smarter people. :)

The codegen for Array.Clear(Array) is pretty well optimized now:

00007ffa`321721a0 4883ec28             sub     rsp, 28h
00007ffa`321721a4 4885c9               test    rcx, rcx  ; arr != null check
00007ffa`321721a7 745d                 je      System_Private_CoreLib!System.Array.Clear(System.Array)+0xffffffff`a0fc77c6 (00007ffa`32172206)
00007ffa`321721a9 488b11               mov     rdx, qword ptr [rcx]  ; rdx := pMethodTable
00007ffa`321721ac 4c8bc2               mov     r8, rdx
00007ffa`321721af 410fb700             movzx   eax, word ptr [r8]  ; rax := MethodTable::ComponentSize
00007ffa`321721b3 480faf4108           imul    rax, qword ptr [rcx+8]  ; rax := TotalByteLength (ComponentSize * NumElements)
00007ffa`321721b8 8b5204               mov     edx, dword ptr [rdx+4]  ; rdx := MethodTable::BaseSize
00007ffa`321721bb 4883c108             add     rcx, 8  ; ideally would be rolled into line immediately below, but eh, whatever
00007ffa`321721bf 488d4c11f0           lea     rcx, [rcx+rdx-10h]  ; rcx := &Array.FirstElement
00007ffa`321721c4 41f70000000001       test    dword ptr [r8], 1000000h  ; <does the method table contain gc-tracked types?>
; <snip>

jit-diff output for CoreLib below. Not sure what's going on with some methods being deleted + reintroduced. Maybe something weird in my build environment? Codegen for methods which now call Array.Clear(Array) instead of Array.Clear(Array, int, int) is smaller, as expected. Methods like Array.Copy which now rely on the internal NativeLength property instead of LongLength also saw codegen reductions.

Top method regressions (bytes):
         260 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|128_0(Array,Array,int,int) (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|82_0(Array,int,int,Object):int (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|107_0(Array,Object,int,int):int (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|113_0(Array,Object,int,int):int (0 base, 1 diff methods)
         120 (     ∞ of base) : System.Private.CoreLib.dasm - SpanHelpers:ClearWithReferences(byref,long) (0 base, 1 diff methods)
         117 (     ∞ of base) : System.Private.CoreLib.dasm - Array:Clear(Array) (0 base, 1 diff methods)
           5 (     ∞ of base) : System.Private.CoreLib.dasm - Array:get_NativeLength():long:this (0 base, 1 diff methods)
           1 (25.00% of base) : System.Private.CoreLib.dasm - Array:get_LongLength():long:this

Top method improvements (bytes):
        -260 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|125_0(Array,Array,int,int) (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|79_0(Array,int,int,Object):int (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|104_0(Array,Object,int,int):int (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|110_0(Array,Object,int,int):int (1 base, 0 diff methods)
         -62 (-29.81% of base) : System.Private.CoreLib.dasm - String:Ctor(ushort,int):String:this
         -22 (-2.41% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Return(ref,bool):this
         -13 (-6.02% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int)
         -10 (-1.34% of base) : System.Private.CoreLib.dasm - Buffer:BlockCopy(Array,int,Array,int,int)
          -9 (-1.01% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int,bool)
          -6 (-2.70% of base) : System.Private.CoreLib.dasm - Array:Clear(Array,int,int)
          -6 (-0.59% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Trim():bool:this
          -5 (-1.62% of base) : System.Private.CoreLib.dasm - ConfigurableArrayPool`1:Return(ref,bool):this
          -4 (-1.71% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,Array,int)
          -2 (-1.72% of base) : System.Private.CoreLib.dasm - Array:UnsafeArrayAsSpan(Array,int,int):Span`1
          -2 (-1.13% of base) : System.Private.CoreLib.dasm - Buffer:ByteLength(Array):int
          -2 (-2.99% of base) : System.Private.CoreLib.dasm - Buffer:GetByte(Array,int):ubyte
          -2 (-2.82% of base) : System.Private.CoreLib.dasm - Buffer:SetByte(Array,int,ubyte)

@ghost
Copy link

ghost commented Apr 20, 2021

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

Optimized some calls to Array.Clear, including adding an internal non-generic Array.Clear(Array) method for common use cases, plus redirecting some existing calls to Array.Clear or Span<T>.Clear (see #51534) or Span<T>.Fill.

This is marked as draft because I'm soliciting feedback on whether to pursue this. It introduces some complexity to the RawArrayData and RawData types (using FieldOffset) in order to knock down the codegen. Whether complicating the C# definitions of these types in order to save some bytes of codegen is worth it I'll leave up to smarter people. :)

The codegen for Array.Clear(Array) is pretty well optimized now:

00007ffa`321721a0 4883ec28             sub     rsp, 28h
00007ffa`321721a4 4885c9               test    rcx, rcx  ; arr != null check
00007ffa`321721a7 745d                 je      System_Private_CoreLib!System.Array.Clear(System.Array)+0xffffffff`a0fc77c6 (00007ffa`32172206)
00007ffa`321721a9 488b11               mov     rdx, qword ptr [rcx]  ; rdx := pMethodTable
00007ffa`321721ac 4c8bc2               mov     r8, rdx
00007ffa`321721af 410fb700             movzx   eax, word ptr [r8]  ; rax := MethodTable::ComponentSize
00007ffa`321721b3 480faf4108           imul    rax, qword ptr [rcx+8]  ; rax := TotalByteLength (ComponentSize * NumElements)
00007ffa`321721b8 8b5204               mov     edx, dword ptr [rdx+4]  ; rdx := MethodTable::BaseSize
00007ffa`321721bb 4883c108             add     rcx, 8  ; ideally would be rolled into line immediately below, but eh, whatever
00007ffa`321721bf 488d4c11f0           lea     rcx, [rcx+rdx-10h]  ; rcx := &Array.FirstElement
00007ffa`321721c4 41f70000000001       test    dword ptr [r8], 1000000h  ; <does the method table contain gc-tracked types?>
; <snip>

jit-diff output for CoreLib below. Not sure what's going on with some methods being deleted + reintroduced. Maybe something weird in my build environment? Codegen for methods which now call Array.Clear(Array) instead of Array.Clear(Array, int, int) is smaller, as expected. Methods like Array.Copy which now rely on the internal NativeLength property instead of LongLength also saw codegen reductions.

Top method regressions (bytes):
         260 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|128_0(Array,Array,int,int) (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|82_0(Array,int,int,Object):int (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|107_0(Array,Object,int,int):int (0 base, 1 diff methods)
         151 (     ∞ of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|113_0(Array,Object,int,int):int (0 base, 1 diff methods)
         120 (     ∞ of base) : System.Private.CoreLib.dasm - SpanHelpers:ClearWithReferences(byref,long) (0 base, 1 diff methods)
         117 (     ∞ of base) : System.Private.CoreLib.dasm - Array:Clear(Array) (0 base, 1 diff methods)
           5 (     ∞ of base) : System.Private.CoreLib.dasm - Array:get_NativeLength():long:this (0 base, 1 diff methods)
           1 (25.00% of base) : System.Private.CoreLib.dasm - Array:get_LongLength():long:this

Top method improvements (bytes):
        -260 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|125_0(Array,Array,int,int) (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|79_0(Array,int,int,Object):int (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|104_0(Array,Object,int,int):int (1 base, 0 diff methods)
        -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|110_0(Array,Object,int,int):int (1 base, 0 diff methods)
         -62 (-29.81% of base) : System.Private.CoreLib.dasm - String:Ctor(ushort,int):String:this
         -22 (-2.41% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Return(ref,bool):this
         -13 (-6.02% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int)
         -10 (-1.34% of base) : System.Private.CoreLib.dasm - Buffer:BlockCopy(Array,int,Array,int,int)
          -9 (-1.01% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int,bool)
          -6 (-2.70% of base) : System.Private.CoreLib.dasm - Array:Clear(Array,int,int)
          -6 (-0.59% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Trim():bool:this
          -5 (-1.62% of base) : System.Private.CoreLib.dasm - ConfigurableArrayPool`1:Return(ref,bool):this
          -4 (-1.71% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,Array,int)
          -2 (-1.72% of base) : System.Private.CoreLib.dasm - Array:UnsafeArrayAsSpan(Array,int,int):Span`1
          -2 (-1.13% of base) : System.Private.CoreLib.dasm - Buffer:ByteLength(Array):int
          -2 (-2.99% of base) : System.Private.CoreLib.dasm - Buffer:GetByte(Array,int):ubyte
          -2 (-2.82% of base) : System.Private.CoreLib.dasm - Buffer:SetByte(Array,int,ubyte)
Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Runtime, optimization

Milestone: 6.0.0

@GrabYourPitchforks
Copy link
Member Author

Note to reviewers - test failures suggest that there are scenarios where the padding inside the array object is non-zero. However, browsing through the sources, I can't find any examples of where this might be the case. Thoughts?

internal sealed class RawData
{
[FieldOffset(0)]
Copy link
Member

@jkotas jkotas Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the improvement here. Is this just a micro-optimization to compensate for missing JIT optimization? I would much preffer to fix the JIT instead of this.

@jkotas
Copy link
Member

jkotas commented Apr 20, 2021

test failures suggest that there are scenarios where the padding inside the array object is non-zero.

Whatever it is, I do not think we should be introducing code that depends on this padding.

@@ -138,8 +138,8 @@ public static unsafe void Copy(Array sourceArray, Array destinationArray, int le
MethodTable* pMT = RuntimeHelpers.GetMethodTable(sourceArray);
if (pMT == RuntimeHelpers.GetMethodTable(destinationArray) &&
!pMT->IsMultiDimensionalArray &&
(uint)length <= (nuint)sourceArray.LongLength &&
(uint)length <= (nuint)destinationArray.LongLength)
(uint)length <= sourceArray.NativeLength &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the NativeLength property to make these casts go away.

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this was trying to reduce codegen and save a register allocation by having a qword comparison directly against the NativeLength field rather than first requiring zero-extension from a dword field into a qword register. Might have to give up that optimization if we can't assume the padding is all-zero, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is worth it to complicate the code to save a few bytes of codegen. It is very x86/64 specific code-size micro-optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just have NativeLength be => Unsafe.As<RawArrayData>(this).Length to force the zero-extension to nuint.

That would make it easier to switch to properly be NativeLength if we ever decide to make the switch in the future and would avoid the potential problems here while keeping things as is (not requiring more complex changes to actually track nuint in the Array metadata today).

@@ -49,20 +49,17 @@ internal static partial class Sys

private static unsafe void AllocNullTerminatedArray(string[] arr, ref byte** arrPtr)
{
int arrLength = arr.Length + 1; // +1 is for null termination
int arrLength = checked(arr.Length + 1); // +1 is for null termination
Copy link
Member

@stephentoub stephentoub Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't actually overflow, can it? arr.Length <= Array.MaxLength < int.MaxValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono isn't bound by this same constraint. It also seems fragile to depend on the runtime never allowing this value to increase in the future.


// Allocate the unmanaged array to hold each string pointer.
// It needs to have an extra element to null terminate the array.
arrPtr = (byte**)Marshal.AllocHGlobal(sizeof(IntPtr) * arrLength);
arrPtr = (byte**)Marshal.AllocHGlobal((nint)sizeof(IntPtr) * arrLength);
Debug.Assert(arrPtr != null);

// Zero the memory so that if any of the individual string allocations fails,
// we can loop through the array to free any that succeeded.
// The last element will remain null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date.

{
_shortPrefixUri[i] = null;
}
Array.Clear(_shortPrefixUri, 0, _shortPrefixUri.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open an issue to expose Array.Clear(Array) publicly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled out into #51581.

@@ -287,7 +309,7 @@ public static unsafe void Clear(Array array, int index, int length)

int offset = index - lowerBound;

if (index < lowerBound || offset < 0 || length < 0 || (uint)(offset + length) > (nuint)array.LongLength)
if (index < lowerBound || offset < 0 || length < 0 || (uint)(offset + length) > array.NativeLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (index < lowerBound || offset < 0 || length < 0 || (uint)(offset + length) > array.NativeLength)
if (index < lowerBound || (offset | length) < 0 || (uint)(offset + length) > array.NativeLength)

Ideally the JIT could do this optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happily, already in progress! :)
#13573

@@ -266,6 +266,28 @@ public static void ConstrainedCopy(Array sourceArray, int sourceIndex, Array des
Copy(sourceArray, sourceIndex, destinationArray, destinationIndex, length, reliable: true);
}

internal static unsafe void Clear(Array array)
{
if (array == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more convenient to just turn this into no-op if the array is null.

matchcount[i] = 0;
}
_ = matchcount.Length; // dereference to avoid null check on below line
matchcount.AsSpan().Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the typical matchcount.Length? Is this going to be an improvement in typical case?

@GrabYourPitchforks
Copy link
Member Author

Latest commit explicitly zeroes out the padding in AllocateSzArray. This is experimental and per #51548 (comment) we might be well-advised to chuck the dependency anyway.

@GrabYourPitchforks
Copy link
Member Author

Force-push because I undid most of the work in this PR. The PR is now greatly slimmed:

  • Adds internal Array.Clear method
  • Redirects some internal calls of the old Array.Clear method to the new method
  • Adds unit tests verifying that IList.Clear (which previously failed for large MDArrays) now succeeds
  • Reverted all other changes w.r.t. spanifying other callers, assuming arrays are zero-padded, etc. This is now focused just on Array.Clear and its immediate callers.

@GrabYourPitchforks GrabYourPitchforks marked this pull request as ready for review April 21, 2021 19:27
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@GrabYourPitchforks GrabYourPitchforks merged commit fd2e564 into dotnet:main Apr 21, 2021
@GrabYourPitchforks GrabYourPitchforks deleted the span_calls branch April 21, 2021 22:38
@ghost ghost locked as resolved and limited conversation to collaborators May 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants