Optimize Vector128 and Vector256.Create methods #35857

tannergooding · 2020-05-05T21:09:19Z

This resolves #11965 and #10033 by updating the Vector128/256.Create methods to be intrinsic on x86.

They are handled entirely in lowering where they will be replaced with the corresponding constant (as was done for GT_SIMD nodes) or where they are lowered to the correct sequence of HWIntrinsics (which allows containment and other checks to "just work"). This should make it rather trivial to support "partial constants" as well (that is, a vector where say 50% of the inputs are constant and the other half are not).

It might be beneficial to eventually create a proper GenTreeVecCns node and to also try and handle this earlier (which would allow constants to be deduplicated and other features), but that is a more involved change.

jit-analyze reports (AVX2):

Found 271 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -680 (-0.001% of base)
    diff is an improvement.

Top file improvements (bytes):
        -336 : diff\System.Text.Encodings.Web.dasm (-1.001% of base)
        -216 : diff\System.Memory.dasm (-0.087% of base)
        -102 : diff\System.Private.CoreLib.dasm (-0.002% of base)
         -26 : diff\System.Collections.dasm (-0.005% of base)

4 total files with Code Size differences (4 improved, 0 regressed), 262 unchanged.

Top method improvements (bytes):
        -277 (-71.762% of base) : diff\System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
         -72 (-5.660% of base) : diff\System.Memory.dasm - Base64:EncodeToUtf8(ReadOnlySpan`1,Span`1,byref,byref,bool):int
         -36 (-2.302% of base) : diff\System.Memory.dasm - Base64:DecodeFromUtf8(ReadOnlySpan`1,Span`1,byref,byref,bool):int
         -36 (-11.921% of base) : diff\System.Memory.dasm - Base64:Avx2Encode(byref,byref,long,int,int,long,long)
         -36 (-13.585% of base) : diff\System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
         -33 (-4.867% of base) : diff\System.Text.Encodings.Web.dasm - DefaultJavaScriptEncoderBasicLatin:FindFirstCharacterToEncode(long,int):int:this
         -30 (-21.429% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
         -30 (-19.355% of base) : diff\System.Private.CoreLib.dasm - Vector256`1:get_AllBitsSet():Vector256`1 (6 methods)
         -26 (-5.817% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:.cctor()
         -18 (-6.000% of base) : diff\System.Memory.dasm - Base64:Avx2Decode(byref,byref,long,int,int,long,long)
         -18 (-6.122% of base) : diff\System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
         -18 (-4.557% of base) : diff\System.Private.CoreLib.dasm - Utf16Utility:GetPointerToFirstInvalidChar(long,int,byref,byref):long
         -14 (-6.731% of base) : diff\System.Collections.dasm - BitArray:.cctor()
         -12 (-3.093% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
          -6 (-1.583% of base) : diff\System.Collections.dasm - BitArray:Not():BitArray:this
          -6 (-0.391% of base) : diff\System.Collections.dasm - BitArray:CopyTo(Array,int):this
          -6 (-0.437% of base) : diff\System.Private.CoreLib.dasm - Utf8Utility:TranscodeToUtf8(long,int,long,int,byref,byref):int
          -6 (-3.846% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long

Top method improvements (percentages):
        -277 (-71.762% of base) : diff\System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
         -30 (-21.429% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
         -30 (-19.355% of base) : diff\System.Private.CoreLib.dasm - Vector256`1:get_AllBitsSet():Vector256`1 (6 methods)
         -36 (-13.585% of base) : diff\System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
         -36 (-11.921% of base) : diff\System.Memory.dasm - Base64:Avx2Encode(byref,byref,long,int,int,long,long)
         -14 (-6.731% of base) : diff\System.Collections.dasm - BitArray:.cctor()
         -18 (-6.122% of base) : diff\System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
         -18 (-6.000% of base) : diff\System.Memory.dasm - Base64:Avx2Decode(byref,byref,long,int,int,long,long)
         -26 (-5.817% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:.cctor()
         -72 (-5.660% of base) : diff\System.Memory.dasm - Base64:EncodeToUtf8(ReadOnlySpan`1,Span`1,byref,byref,bool):int
         -33 (-4.867% of base) : diff\System.Text.Encodings.Web.dasm - DefaultJavaScriptEncoderBasicLatin:FindFirstCharacterToEncode(long,int):int:this
         -18 (-4.557% of base) : diff\System.Private.CoreLib.dasm - Utf16Utility:GetPointerToFirstInvalidChar(long,int,byref,byref):long
          -6 (-3.846% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
         -12 (-3.093% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
         -36 (-2.302% of base) : diff\System.Memory.dasm - Base64:DecodeFromUtf8(ReadOnlySpan`1,Span`1,byref,byref,bool):int
          -6 (-1.583% of base) : diff\System.Collections.dasm - BitArray:Not():BitArray:this
          -6 (-0.437% of base) : diff\System.Private.CoreLib.dasm - Utf8Utility:TranscodeToUtf8(long,int,long,int,byref,byref):int
          -6 (-0.391% of base) : diff\System.Collections.dasm - BitArray:CopyTo(Array,int):this

18 total methods with Code Size differences (18 improved, 0 regressed), 244800 unchanged.

1 files had text diffs but no metric diffs.
diff\System.Drawing.Primitives.dasm had 2 diff

jit-analyze reports (SSE2):

Found 270 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -708 (-0.002% of base)
    diff is an improvement.

Top file regressions (bytes):
          11 : diff\System.Collections.dasm (0.002% of base)

Top file improvements (bytes):
        -370 : diff\System.Text.Encodings.Web.dasm (-1.079% of base)
        -200 : diff\System.Private.CoreLib.dasm (-0.005% of base)
        -149 : diff\System.Memory.dasm (-0.059% of base)

4 total files with Code Size differences (3 improved, 1 regressed), 262 unchanged.

Top method regressions (bytes):
          18 (9.231% of base) : diff\System.Collections.dasm - BitArray:.cctor()

Top method improvements (bytes):
        -279 (-74.005% of base) : diff\System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
         -99 (-18.574% of base) : diff\System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
         -56 (-12.670% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:.cctor()
         -55 (-10.358% of base) : diff\System.Private.CoreLib.dasm - Utf16Utility:GetPointerToFirstInvalidChar(long,int,byref,byref):long
         -50 (-7.257% of base) : diff\System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
         -35 (-29.167% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
         -35 (-3.747% of base) : diff\System.Text.Encodings.Web.dasm - DefaultJavaScriptEncoderBasicLatin:FindFirstCharacterToEncode(long,int):int:this
         -18 (-1.098% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,ubyte,int):int (2 methods)
         -17 (-4.038% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
         -17 (-7.623% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
         -16 (-1.309% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,int):int (2 methods)
          -7 (-1.877% of base) : diff\System.Collections.dasm - BitArray:Not():BitArray:this
          -6 (-2.000% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,ushort,int):int
          -6 (-0.713% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,ubyte,int):int (2 methods)
          -6 (-15.385% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ubyte):Vector128`1
          -6 (-15.000% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(byte):Vector128`1
          -6 (-2.449% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector128`1
          -6 (-2.317% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):Vector128`1
          -3 (-10.000% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(short):Vector128`1
          -3 (-10.345% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ushort):Vector128`1

Top method regressions (percentages):
          18 (9.231% of base) : diff\System.Collections.dasm - BitArray:.cctor()

Top method improvements (percentages):
        -279 (-74.005% of base) : diff\System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
         -35 (-29.167% of base) : diff\System.Private.CoreLib.dasm - Vector128`1:get_AllBitsSet():Vector128`1 (6 methods)
         -99 (-18.574% of base) : diff\System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
          -6 (-15.385% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ubyte):Vector128`1
          -6 (-15.000% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(byte):Vector128`1
         -56 (-12.670% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:.cctor()
         -55 (-10.358% of base) : diff\System.Private.CoreLib.dasm - Utf16Utility:GetPointerToFirstInvalidChar(long,int,byref,byref):long
          -3 (-10.345% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ushort):Vector128`1
          -3 (-10.000% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(short):Vector128`1
         -17 (-7.623% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
         -50 (-7.257% of base) : diff\System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
         -17 (-4.038% of base) : diff\System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
         -35 (-3.747% of base) : diff\System.Text.Encodings.Web.dasm - DefaultJavaScriptEncoderBasicLatin:FindFirstCharacterToEncode(long,int):int:this
          -6 (-2.449% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector128`1
          -6 (-2.317% of base) : diff\System.Private.CoreLib.dasm - Vector128:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):Vector128`1
          -6 (-2.000% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,ushort,int):int
          -7 (-1.877% of base) : diff\System.Collections.dasm - BitArray:Not():BitArray:this
         -16 (-1.309% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,int):int (2 methods)
         -18 (-1.098% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,ubyte,ubyte,ubyte,int):int (2 methods)
          -6 (-0.713% of base) : diff\System.Private.CoreLib.dasm - SpanHelpers:IndexOf(byref,ubyte,int):int (2 methods)

21 total methods with Code Size differences (20 improved, 1 regressed), 244797 unchanged.

The regression for SSE2 is because we are now inlining a Vector128.Create(long) call where we were not previously

…trinsic

tannergooding · 2020-05-05T21:11:36Z

Diffs are similar to:

-       mov      ecx, 128
-       vmovd    xmm0, ecx
-       vpbroadcastw xmm0, xmm0
+       vmovupd  xmm0, xmmword ptr [reloc @RWD00]
        ...
+ RWD00  db	080h, 000h, 080h, 000h, 080h, 000h, 080h, 000h, 080h, 000h, 080h, 000h, 080h, 000h, 080h, 000h

tannergooding · 2020-05-05T21:11:51Z

CC. @CarolEidt, @echesakovMSFT

tannergooding · 2020-05-05T21:19:26Z

As a separate issue, I noticed for the various intrinsics that take a byte, sbyte, short, or ushort, we will often get nodes like:

               [000038] -----+------                 +--*  CAST      int <- ubyte <- int
               [000000] -----+------                 |  \--*  LCL_VAR   int    V01 arg0

In the case where the instruction can take a mem8 or mem16 (such as pinsrb xmm, r32/m8, imm8 and pinsrw) we could contain the operand and support reading directly from the memory location

CarolEidt · 2020-05-06T16:11:21Z

I think the cast should be handled as contained along with its operand.

CarolEidt

Overall LGTM, but it's a lot of code and could use some more textual comments describing what's being done, and it would be helpful to give meaningful name to the temps (the t*s) in the pseudo-IR to make it easier to follow.

CarolEidt · 2020-05-06T17:38:54Z

src/coreclr/src/jit/hwintrinsicxarch.cpp

+                // TODO-XARCH-CQ: It may be beneficial to emit the movq
+                // instruction, which takes a 64-bit memory address and
+                // works on 32-bit x86 systems.
+                break;


Is there a reason you chose not to do this?

It's not something we are handling today anywhere else and we have an identical TODO in the other places.

I believe this would just require us creating a GT_IND node since x86 requires it be a long* or ulong*, but I'm not positive if that is the ideal/correct way to handle it

Just noting, this is also not something we are handling in the managed implementation today.

Right, I gathered that, but was just curious, given the comment.

I'm not super familiar with the differences between the different addressing forms in the JIT and when each is appropriate to use.

Do we have a helper function that will create a T* or ref T from an arbitrary T (In this case a ulong* from a ulong) when the given operand on the stack could be a constant, local, field, byref, or other indirection?

IIUC you really want a GT_ADDR, to get the address of an operand. I the JIT addresses are not strongly typed, they are always TYP_BYREF, so it would just be something like:

GenTree* addr = gtNewOperNode(GT_ADDR, TYP_BYREF, op);

Though I'd note that I'm not really pressing for you to do this now - I was just curious why you added the comment rather than doing it.

I'll attempt this in a follow up PR, hopefully it is that straightforward 😄

src/coreclr/src/jit/lowerxarch.cpp

CarolEidt · 2020-05-06T21:04:12Z

src/coreclr/src/jit/lowerxarch.cpp

+            //      /--*  t?     simd32
+            //      +--*  t?     simd16
+            //      +--*  t?     int
+            // t0 = *  HWINTRINSIC simd32 T InsertVector128


This comment doesn't match what's generated below (e.g. you're creating a constant of 0x01). Also, it would be easier to follow if you referred to 'op1', 'op2' and 'idx' here (and maybe add 'v' for the target of NI_Vector128_Create and result for the final value, instead of using 't2' and 't?'.
I would also do something similar for the comments below.

Looks like I forgot to fill in the generated LIR for this one. I'm updating it and the other comments to be closer to the following:

// We will be constructing the following parts: // /--* op1 T // tmp = * HWINTRINSIC simd16 T Create // /--* tmp simd16 // * STORE_LCL_VAR simd16 // tmp = LCL_VAR simd16 // dup = LCL_VAR simd16 // /--* dup simd16 // dup = * HWINTRINSIC simd16 T ToVector256Unsafe // idx = CNS_INT int 0 // /--* dup simd32 // +--* tmp simd16 // +--* idx int // node = * HWINTRINSIC simd32 T InsertVector128 // This is roughly the following managed code: // var tmp = Vector128.Create(op1); // var dup = tmp.ToVector256Unsafe(tmp); // return Avx.InsertVector128(dup, tmp, 1);

src/coreclr/src/jit/lowerxarch.cpp

CarolEidt · 2020-05-06T21:35:30Z

src/coreclr/src/jit/lowerxarch.cpp

+        }
+        else
+        {
+            tmp = op2->AsArgList()->Current();


I'm quite confused by this, and I suspect I'm missing something. You've handle the 4-argument case above, and this handles 2. Where do you handle 8 arguments?

This one actually handles everything that is > 4 and I can add an assert + comment to clarify.

For Vector256 (which this code path is restricted to) we will have 4, 8, 16, or 32 operands, so in all cases we have a GT_LIST.
Both paths construct the upper and lower Vector128 portions and then combine them. The for loop here ensures that op1 points to the first operand and op2 points to the 3rd, 5th, 9th, or 17th operand.

The first path needs to exist since it will be 2 operands per half, and se we need to track them in gtOp1/gtOp2 rather than in a GT_LIST
While the second path can just use the original list, split into two halves with the 2nd, 4th, 8th, or 16th operand no longer have a gtOp2 (which normally points to the next operand in the list).

Ah yes, I see now that op1 and op2 are still lists, and the two NI_Vector128_Create intrinsics are then recursively lowered. I might ask that you create a new GenTree* variable for those rather than reusing op1. I know that the JIT reuses variables all over the place, but I don't think it's a great practice.

Added a comment explaining the split and what values are expected where, using new variables named lo and hi to track the lower and upper halves of the Vector256 being created.

…ctor256.Create calls are lowered

tannergooding · 2020-05-07T03:55:54Z

I believe I addressed all the feedback given and everything passes for the various ISAs locally so I've kicked off the jitstress-isas-x86 job.

tannergooding · 2020-05-07T14:10:48Z

Resolved an issue that showed up with ARM where the Create overloads that took more than 1 argument were being recognized as intrinsic and causing asserts later.
HWIntrinsicInfo::lookupId now takes the number of arguments into account when matching to avoid this issue.

tannergooding · 2020-05-07T17:46:41Z

ARM64 failure is because mustExpand doesn't appear to take into account that the recursion is in a dead code path.

Edit: To clarify, we don't take recursion into account when some overload of the method is intrinsic.

So if you have Vector128.Create(T) which is intrinsic and Vector128.Create(T, ..., T) which is not; then you'll get NI_Illegal rather than NI_Throw_PlatformNotSupportedException.
For cases like Vector64.Create on x86/x64 or Vector256.Create on ARM64 where no overloads are intrinsic, this isn't an issue and is handled.

tannergooding · 2020-05-07T18:04:37Z

I've updated the calls to not be recursive on ARM64 for right now. I'm working on getting the same support done for ARM64, but it will be a separate PR to help keep the size of this one smaller.

Edit: We also need to expose a couple of additional intrinsics before I can implement the Create(T) versions. We have INS_dup but no corresponding intrinsic it can be lowered to yet

CarolEidt

LGTM - thanks for the additional comments; they may seem verbose, but they really help with understanding what's going on!

tannergooding · 2020-05-07T21:37:29Z

Looks like there is some superpmicollect failure for x86, investigating:

ERROR: Exception thrown: DebugBreak or AV Exception 123
ERROR: main method 4 of size 1086 failed to load and compile correctly.
ERROR: Exception thrown: DebugBreak or AV Exception 123
ERROR: main method 4 of size 1086 failed to load and compile correctly.
ERROR: replay of final file is not error free

tannergooding · 2020-05-08T00:14:57Z

Seems superpmi ignores all CorJitAllocMemFlag and so asserting the alignment fails. We've just been getting lucky on x86 with the limited number of vector constants that we had in Corelib so far and them likely being off the paths the superpmicollect test was covering.

@CarolEidt, I've updated MyICJI::allocMem to start respecting a subset of the flags, could you give the latest commit a glance?

…Flag

CarolEidt

LGTM - thanks

tannergooding added 3 commits May 5, 2020 06:58

Updating Vector128.Create(T value) to be intrinsic

97f40a4

Updating Vector128.Create(T, ..., T) to be intrinsic

f89e85f

Updating Vector256.Create(T) and Vector256.Create(T, ..., T) to be in…

0056c5f

…trinsic

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2020

Applying formatting patch

8a5a12e

jaredpar mentioned this pull request May 6, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

CarolEidt reviewed May 6, 2020

View reviewed changes

tannergooding added 4 commits May 6, 2020 15:35

Merge remote-tracking branch 'dotnet/master' into const-vector

ef8ec3f

Adding additional comments explaining how the Vector128.Create and Ve…

fdf297f

…ctor256.Create calls are lowered

Add an assert that argCnt is as expected for LowerHWIntrinsicCreate

9376428

Applying formatting patch

96d9199

tannergooding added 3 commits May 6, 2020 22:10

Use unsigned rather than signed types in LowerHWIntrinsicCreate

781cc04

Have HWIntrinsicInfo::lookupId take the number of arguments into account

c90baf3

Merge remote-tracking branch 'dotnet/master' into const-vector

6f3511b

Adjusting Vector128.Create(T, ..., T) to not be recursive on ARM64

67f1b1f

tannergooding mentioned this pull request May 7, 2020

[Arm64] "Move" Intrinsics #35037

Closed

CarolEidt approved these changes May 7, 2020

View reviewed changes

tannergooding force-pushed the const-vector branch from e2dbef8 to 7b3b272 Compare May 8, 2020 00:16

Fixing MyICJI::allocMem in superpmi to respect certain CorJitAllocMem…

c056b88

…Flag

tannergooding force-pushed the const-vector branch from 7b3b272 to c056b88 Compare May 8, 2020 00:28

tannergooding mentioned this pull request May 8, 2020

Optimize AdvSimd.Extract() when passed variable that can be const propagated #36070

Closed

Avoid an implicit downcast

51c7d28

CarolEidt approved these changes May 8, 2020

View reviewed changes

tannergooding merged commit 4d16376 into dotnet:master May 8, 2020

tannergooding mentioned this pull request May 8, 2020

Support emitting a constant for the Vector64/128/256.Create hardware intrinsic methods #10033

Closed

safern mentioned this pull request May 8, 2020

System.Diagnostics.Tests.EventLogSourceCreationTests failing on PRs #36135

Open

tannergooding mentioned this pull request May 12, 2020

Optimize Vector64 and Vector128.Create methods #36267

Merged

saucecontrol mentioned this pull request May 17, 2020

Add Hardware Accelerated Checksums SixLabors/ImageSharp#1201

Merged

4 tasks

mangod9 mentioned this pull request May 18, 2020

Fix buffer overrun in JIT for Vector256<T> types on ARM64 #35864

Merged

tannergooding mentioned this pull request Jun 17, 2020

Vectorized common String.Split() paths #38001

Merged

gfoidl mentioned this pull request Aug 16, 2020

Question : What is the most performant way to represent constant System.Runtime.Intrinsics.VectorXXX<T>? #40900

Open

stephentoub mentioned this pull request Sep 15, 2020

Cleanup BitArray code #38780

Closed

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Vector128 and Vector256.Create methods #35857

Optimize Vector128 and Vector256.Create methods #35857

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

CarolEidt commented May 6, 2020

CarolEidt left a comment

CarolEidt May 6, 2020

tannergooding May 6, 2020

tannergooding May 6, 2020

CarolEidt May 6, 2020

tannergooding May 6, 2020 •

edited

Loading

CarolEidt May 6, 2020

CarolEidt May 6, 2020

tannergooding May 6, 2020

CarolEidt May 6, 2020

tannergooding May 6, 2020 •

edited

Loading

CarolEidt May 6, 2020

tannergooding May 6, 2020

CarolEidt May 6, 2020

tannergooding May 7, 2020

tannergooding commented May 7, 2020

tannergooding commented May 7, 2020

tannergooding commented May 7, 2020 •

edited

Loading

tannergooding commented May 7, 2020 •

edited

Loading

CarolEidt left a comment

tannergooding commented May 7, 2020

tannergooding commented May 8, 2020

CarolEidt left a comment

Optimize Vector128 and Vector256.Create methods #35857

Optimize Vector128 and Vector256.Create methods #35857

Conversation

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

tannergooding commented May 5, 2020

CarolEidt commented May 6, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding May 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding May 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented May 7, 2020

tannergooding commented May 7, 2020

tannergooding commented May 7, 2020 • edited Loading

tannergooding commented May 7, 2020 • edited Loading

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented May 7, 2020

tannergooding commented May 8, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding May 6, 2020 •

edited

Loading

tannergooding May 6, 2020 •

edited

Loading

tannergooding commented May 7, 2020 •

edited

Loading

tannergooding commented May 7, 2020 •

edited

Loading