Improve Span.SequenceEqual for small buffers. #32364

ahsonkhan · 2020-02-15T07:35:05Z

summary:
better: 6, geomean: 1.168
worse: 1, geomean: 1.216
total diff: 7

Slower	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Threshold.SequenceEqual(Length: 0)	1.22	2.59	3.15

Faster	base/diff	Base Median (ns)	Diff Median (ns)
Threshold.SequenceEqual(Length: 7)	1.24	8.29	6.70
Threshold.SequenceEqual(Length: 6)	1.22	8.05	6.62
Threshold.SequenceEqual(Length: 5)	1.17	7.45	6.35
Threshold.SequenceEqual(Length: 2)	1.14	5.15	4.52
Threshold.SequenceEqual(Length: 3)	1.14	6.08	5.36
Threshold.SequenceEqual(Length: 1)	1.11	4.10	3.69

cc @jkotas, @benaadams, @GrabYourPitchforks

ahsonkhan · 2020-02-15T18:20:14Z

CI failures are unrelated:
#32377
#32378
#32367

benaadams · 2020-02-15T20:28:46Z

This is very good :)

stephentoub · 2020-02-15T20:46:06Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

@@ -1312,28 +1312,32 @@ public static unsafe int LastIndexOfAny(ref byte searchSpace, byte value0, byte
        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public static unsafe bool SequenceEqual(ref byte first, ref byte second, nuint length)
        {
-            if (Unsafe.AreSame(ref first, ref second))
-                goto Equal;
-
            IntPtr offset = (IntPtr)0; // Use IntPtr for arithmetic to avoid unnecessary 64->32->64 truncations
            IntPtr lengthToExamine = (IntPtr)(void*)length;


Seems like lengthToExamine is cast to a byte* most places it's used. Should it just be stored as one in the first place? Same for offset. (Looking at diff on my phone so maybe I'm just not seeing the reason.)

All these should be fixed to use nuint/nint. It will be easier once Roslyn adds native support.

Picking this up in a follow up PR (for this method)

stephentoub · 2020-02-15T21:02:36Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

                    }
-                    offset += Vector<byte>.Count;
+                    return LoadVector(ref first, lengthToExamine) == LoadVector(ref second, lengthToExamine);


Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256? I'm curious if/how alignment affects this final comparison, which would seem to generally be unaligned in such a case. Not an issue? I ask simply because in other implementations I've seen us go out of our way to try to align such operations.

Do our perf tests look at comparison lengths larger than one vector and between vector lengths, e.g. 257 if the vector size is 256?

No, our perf tests aren't very extensive for some of the APIs. I was told to not bloat the number of test permutations too much. Right now we only test length 512.
We can certainly do one-offs locally though.

Feel free to add more here:
https://github.com/dotnet/performance/blob/1930f660f56f80b0cad5bfc749fe4c46464801f4/src/benchmarks/micro/libraries/System.Memory/Span.cs#L14-L48

Locally for me, Vector<byte>.Count = 32.

Here are the results for what's in master atm:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041 Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=5.0.100-alpha1-015914 [Host] : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT Job-BCFXLD : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT PowerPlanMode=00000000-0000-0000-0000-000000000000 MaxIterationCount=10 MinIterationCount=5 WarmupCount=3

Method Length Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated

SequenceEqual 32 2.813 ns 0.0692 ns 0.0180 ns 2.817 ns 2.782 ns 2.826 ns - - - -

SequenceEqual 33 3.439 ns 0.1479 ns 0.0880 ns 3.415 ns 3.341 ns 3.556 ns - - - -

SequenceEqual 34 4.010 ns 0.0969 ns 0.0252 ns 4.001 ns 3.993 ns 4.055 ns - - - -

SequenceEqual 63 3.401 ns 0.0959 ns 0.0502 ns 3.414 ns 3.286 ns 3.443 ns - - - -

SequenceEqual 64 3.300 ns 0.0775 ns 0.0201 ns 3.305 ns 3.278 ns 3.327 ns - - - -

SequenceEqual 65 4.118 ns 0.1121 ns 0.0586 ns 4.122 ns 3.993 ns 4.201 ns - - - -

SequenceEqual 256 7.424 ns 0.2525 ns 0.1503 ns 7.441 ns 7.211 ns 7.705 ns - - - -

SequenceEqual 257 7.622 ns 0.2085 ns 0.1379 ns 7.673 ns 7.433 ns 7.848 ns - - - -

Improve Span.SequenceEqual for small buffers.

e4d1371

ahsonkhan added area-System.Memory tenet-performance Performance related issue labels Feb 15, 2020

ahsonkhan added this to the 5.0 milestone Feb 15, 2020

jkotas approved these changes Feb 15, 2020

View reviewed changes

This was referenced Feb 15, 2020

Several tests fail on mono with mono_threads_pthread_kill: pthread_kill failed with error 11 #32377

Closed

System.Drawing.Tests.GdiplusTests.IsAtLeastLibgdiplus6 fails on netcoreapp5.0-OSX-Debug-x64-Mono_release-OSX.1013.Amd64.Open #32378

Closed

ahsonkhan merged commit 8ac93bb into dotnet:master Feb 15, 2020

ahsonkhan deleted the OptimizeSeqEqualForSmall branch February 15, 2020 18:28

stephentoub reviewed Feb 15, 2020

View reviewed changes

benaadams mentioned this pull request Feb 15, 2020

Use intrinsics for SequenceEqual<byte> vectorization to emit at R2R #32371

Merged

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Span.SequenceEqual for small buffers. #32364

Improve Span.SequenceEqual for small buffers. #32364

ahsonkhan commented Feb 15, 2020 •

edited

Loading

ahsonkhan commented Feb 15, 2020

benaadams commented Feb 15, 2020

stephentoub Feb 15, 2020

jkotas Feb 15, 2020

benaadams Feb 16, 2020

stephentoub Feb 15, 2020

ahsonkhan Feb 15, 2020 •

edited

Loading

ahsonkhan Feb 16, 2020

Method	Length	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
SequenceEqual	32	2.813 ns	0.0692 ns	0.0180 ns	2.817 ns	2.782 ns	2.826 ns	-	-	-	-
SequenceEqual	33	3.439 ns	0.1479 ns	0.0880 ns	3.415 ns	3.341 ns	3.556 ns	-	-	-	-
SequenceEqual	34	4.010 ns	0.0969 ns	0.0252 ns	4.001 ns	3.993 ns	4.055 ns	-	-	-	-
SequenceEqual	63	3.401 ns	0.0959 ns	0.0502 ns	3.414 ns	3.286 ns	3.443 ns	-	-	-	-
SequenceEqual	64	3.300 ns	0.0775 ns	0.0201 ns	3.305 ns	3.278 ns	3.327 ns	-	-	-	-
SequenceEqual	65	4.118 ns	0.1121 ns	0.0586 ns	4.122 ns	3.993 ns	4.201 ns	-	-	-	-
SequenceEqual	256	7.424 ns	0.2525 ns	0.1503 ns	7.441 ns	7.211 ns	7.705 ns	-	-	-	-
SequenceEqual	257	7.622 ns	0.2085 ns	0.1379 ns	7.673 ns	7.433 ns	7.848 ns	-	-	-	-

Improve Span.SequenceEqual for small buffers. #32364

Improve Span.SequenceEqual for small buffers. #32364

Conversation

ahsonkhan commented Feb 15, 2020 • edited Loading

ahsonkhan commented Feb 15, 2020

benaadams commented Feb 15, 2020

stephentoub Feb 15, 2020

Choose a reason for hiding this comment

jkotas Feb 15, 2020

Choose a reason for hiding this comment

benaadams Feb 16, 2020

Choose a reason for hiding this comment

stephentoub Feb 15, 2020

Choose a reason for hiding this comment

ahsonkhan Feb 15, 2020 • edited Loading

Choose a reason for hiding this comment

ahsonkhan Feb 16, 2020

Choose a reason for hiding this comment

ahsonkhan commented Feb 15, 2020 •

edited

Loading

ahsonkhan Feb 15, 2020 •

edited

Loading