Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf -16%] System.Collections.CopyTo<String> (6) #37814

Closed
DrewScoggins opened this issue Jun 12, 2020 · 25 comments
Closed

[Perf -16%] System.Collections.CopyTo<String> (6) #37814

DrewScoggins opened this issue Jun 12, 2020 · 25 comments
Labels
arch-x64 area-System.Collections needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@DrewScoggins
Copy link
Member

DrewScoggins commented Jun 12, 2020

Run Information

Architecture x64
OS Windows 10.0.18362
Changes diff

Regressions in System.Collections.CopyTo

Benchmark Baseline Test Test/Base Modality Baseline Outlier
Span 593.19 ns 644.50 ns 1.09 False
Memory 584.93 ns 641.29 ns 1.10 False
Array 493.59 ns 592.57 ns 1.20 False
ReadOnlySpan 563.26 ns 628.13 ns 1.12 False
ReadOnlyMemory 595.64 ns 657.34 ns 1.10 False
List 487.56 ns 650.31 ns 1.33 False

graph
graph
graph
graph
graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.CopyTo<String>*';

Histogram

System.Collections.CopyTo.Span(Size: 2048)

[524.780 ; 546.348) | @@@
[546.348 ; 573.011) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[573.011 ; 597.536) | @@@@@@
[597.536 ; 627.739) | @
[627.739 ; 654.402) | @@@@@@@@@@@@@@@@@@

System.Collections.CopyTo.Memory(Size: 2048)

[565.045 ; 589.110) | @@@@@@@@@@@@@@@@@@@@
[589.110 ; 614.506) | @@@@@@@@@@@@@@
[614.506 ; 641.785) | @@@@@@@@@
[641.785 ; 677.056) | @@@@@@@@@@@@@@@@

System.Collections.CopyTo.Array(Size: 2048)

[482.924 ; 507.846) | @@@@@@
[507.846 ; 529.187) | @@@@@@@@@@@@@@@@@@@@@@
[529.187 ; 557.054) | @@@@@@@@@@@
[557.054 ; 579.833) | @@@@@@@
[579.833 ; 601.175) | @@@@@@@@@@@@@

System.Collections.CopyTo.ReadOnlySpan(Size: 2048)

[509.340 ; 528.944) | @
[528.944 ; 558.954) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[558.954 ; 591.488) | @@@@@
[591.488 ; 619.556) | @
[619.556 ; 649.566) | @@@@@@@@@@@@@@@@@@

System.Collections.CopyTo.ReadOnlyMemory(Size: 2048)

[522.769 ; 539.054) | @
[539.054 ; 569.696) | @@@@@@@@@@@@@@@@@@@@@@@@@
[569.696 ; 598.854) | @@@@@@@@@@@@@
[598.854 ; 634.010) | @@
[634.010 ; 664.652) | @@@@@@@@@@@@@@@@@
[664.652 ; 686.400) | @

System.Collections.CopyTo.List(Size: 2048)

[463.079 ; 493.989) | @@
[493.989 ; 537.133) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[537.133 ; 560.837) | @
[560.837 ; 603.982) | 
[603.982 ; 626.781) | 
[626.781 ; 669.926) | @@@@@@@@@@@@@@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins DrewScoggins added the tenet-performance-benchmarks Issue from performance benchmark label Jun 12, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Jun 12, 2020
@ghost
Copy link

ghost commented Jun 12, 2020

Tagging subscribers to this area: @eiriktsarpalis
Notify danmosemsft if you want to be subscribed.

@DrewScoggins
Copy link
Member Author

@adamsitnik

@eiriktsarpalis eiriktsarpalis added this to the 5.0.0 milestone Jun 24, 2020
@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark os-windows arch-x64 and removed tenet-performance-benchmarks Issue from performance benchmark labels Jul 7, 2020
@danmoseley
Copy link
Member

danmoseley commented Aug 2, 2020

Commit range as best I can tell. @DrewScoggins the interactive graph will not set a commit for "before" for the runtime repo -- it just says "0". So I bounded the start by the date/time of that datapoint. Is that right, or am I reading it wrong?

C:\git\runtime>git log --color --graph --pretty=format:"%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset %cD" --abbrev-commit 7b4a9e69a6d77159fcb1462e7db320291699582a...c07be36e3ce00b06e99d02669fd658a813cdce11 src/coreclr/src/vm/** src/libraries/** src/coreclr/src/System.Private.CoreLib/**
* f1dd1d36c77 - Change location of some source files (#1376) (7 months ago) <Adeel Mujahid> Tue, 7 Jan 2020 16:27:29 -0800
* 9de4006e296 - Fix Reading Time Formats with  quotes  from ICU (#1370) (7 months ago) <Tarek Mahmoud Sayed> Tue, 7 Jan 2020 16:10:26 -0800
* 5da85d0c4b6 - Delete Tests.props (#1236) (7 months ago) <Jan Kotas> Tue, 7 Jan 2020 12:15:13 -0800
* 690542e3a9a - Another PAL cleanup (#1359) (7 months ago) <Jan Vorlicek> Tue, 7 Jan 2020 08:21:34 -0800
* 66cb5f9a1bd - New PhysicalAddress.TryParse methods taking span and string (#1057) (7 months ago) <Alexander Nikolaev> Tue, 7 Jan 2020 14:39:46 +0100
* c1a51e2c538 - Make sure default ALC is initialized before resolving satellite assembly (#1032) (7 months ago) <Elinor Fung> Mon, 6 Jan 2020 17:26:10 -0800
* e82f56f2fc4 - Fix broken docs links (#1005) (7 months ago) <Jeff Handley> Mon, 6 Jan 2020 17:17:21 -0800
* 578fe57d906 - PAL unused functions cleanup (#1313) (7 months ago) <Jan Vorlicek> Mon, 6 Jan 2020 16:48:58 -0800
* 1ffda0f0ab9 - Force Windows SDK to be 10.0 (#1331) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 16:48:03 -0800
* dfed9e24f18 - Fix compMatchedVM by using osType to detect cross target compilation in case of AOT scenario (#1243) (7 months ago) <Alexandre Mutel> Mon, 6 Jan 2020 16:03:14 -0800
* 8cc2aa35933 - Optimize AdjustMaxWorkersActive (#1103) (7 months ago) <Bill Wert> Mon, 6 Jan 2020 14:31:11 -0800
* 6425b2d1c25 - Copy change to XsdDateTime (#1319) (7 months ago) <ts2do> Mon, 6 Jan 2020 16:57:32 -0500
* 5afe409d594 - Remove STREAM_MEMORY_OVERLOADS_AVAILABLE (#1328) (7 months ago) <Eric StJohn> Mon, 6 Jan 2020 12:49:43 -0800
* de6f41e37f0 - CoreFX Renames in System.IO PathInternal.Tests (#1017) (7 months ago) <Erhan Atesoglu> Mon, 6 Jan 2020 13:34:08 -0500
* f07bf3523a2 - Fix typo: availabe -> available (#1324) (7 months ago) <Andrew Arnott> Mon, 6 Jan 2020 10:29:28 -0800
* 92603a6e11c - implement GetScopeId() on Linux (#1038) (7 months ago) <Tomas Weinfurt> Mon, 6 Jan 2020 12:17:50 -0500
* f6d14eeaf29 - Add asserts to catch unintentional use of generic overloads (#1200) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 09:15:24 -0800
* 25d316cefaf - Move Corelib shared files under Common (#1307) (7 months ago) <Jan Kotas> Mon, 6 Jan 2020 09:07:16 -0800
* 4d3ec15a0e1 - Implement HttpContent and HttpClient cancellation APIs (#686) (7 months ago) <Miha Zupan> Mon, 6 Jan 2020 12:28:15 +0100
* fd5625087f1 - Share compiler config code between libs and coreclr (#1072) (7 months ago) <Adeel Mujahid> Mon, 6 Jan 2020 10:19:58 +0100
* 5b968685ce8 - Consolidate .netcoreapp.cs files because System.Numerics.Vectors project is no longer cross-compiled (#1310) (7 months ago) <Roman Marusyk> Sun, 5 Jan 2020 18:11:45 -0800
* 2c6ec170197 - Fix Environment.UserInteractive on Windows (#1234) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 23:11:04 -0800
* 5bba5aa36a3 - Removing ArrayTypeDesc (#1201) (7 months ago) <Vladimir Sadov> Fri, 3 Jan 2020 21:53:15 -0800
* 680487ffa4d - Revert "Revert "Support GetCultureInfo with predefinedOnly flag (#654)" (#1257)" (#1261) (7 months ago) <Tarek Mahmoud Sayed> Fri, 3 Jan 2020 15:12:18 -0800
* 4bd7e3accac - Make corelib work in VS (#1284) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 14:59:58 -0800
* 5e2c61bc2a0 - Improve message for missing C++ workload (#1253) (7 months ago) <Dan Moseley> Fri, 3 Jan 2020 09:29:50 -0800
* cb1e7be56ea - Fix CompareInfo and SQL tests (#1254) (7 months ago) <Tarek Mahmoud Sayed> Fri, 3 Jan 2020 08:47:11 -0800
* f1c7ed882a9 - Move most of CoreLib interop under Common (#1247) (7 months ago) <Jan Kotas> Thu, 2 Jan 2020 20:21:27 -0800
* c3260404f41 - Arm64: Implement AbsoluteCompare* hardware intrinsics (#1223) (7 months ago) <Egor Chesakov> Thu, 2 Jan 2020 16:41:13 -0800
* 4f27799093d - Revert "Support GetCultureInfo with predefinedOnly flag (#654)" (#1257) (7 months ago) <Tarek Mahmoud Sayed> Thu, 2 Jan 2020 15:31:34 -0800
* 5272637ab8a - Support GetCultureInfo with predefinedOnly flag (#654) (7 months ago) <Tarek Mahmoud Sayed> Thu, 2 Jan 2020 15:25:42 -0800
* c4f221c9c31 - Combine pal_compiler.h definitions (#1240) (7 months ago) <Adeel Mujahid> Wed, 1 Jan 2020 17:01:57 -0800

Nothing jumps out there. I guess next step is to profile the scenario for 5.0 vs 3.1 all up. This would also eliminate a configuration/microcode change. @DrewScoggins do we have any way eg more datapoints to bound this better? Also, how can I see the full trend from 3.1 to now?

@danmoseley
Copy link
Member

Reproduces locally. Note that ImmutableArray scenario didn't regress, presumably because nothing is actually copied 😄

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.421 (2004/?/20H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.1.20401.3
[Host] : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
Job-LFQMVW : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
Job-GJJUUG : .NET Core 5.0.0 (CoreCLR 5.0.20.40101, CoreFX 5.0.20.40101), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Runtime Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Array Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 452.9 ns 4.30 ns 4.02 ns 452.1 ns 446.5 ns 460.0 ns 1.00 0.00 - - - -
Array Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 494.8 ns 2.72 ns 2.27 ns 493.7 ns 491.5 ns 498.6 ns 1.09 0.01 - - - -
Span Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 496.5 ns 10.11 ns 11.65 ns 499.2 ns 468.1 ns 513.5 ns 1.00 0.00 - - - -
Span Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 560.5 ns 3.50 ns 3.28 ns 559.5 ns 556.1 ns 568.0 ns 1.13 0.03 - - - -
ReadOnlySpan Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 472.6 ns 5.42 ns 4.80 ns 471.7 ns 466.5 ns 482.5 ns 1.00 0.00 - - - -
ReadOnlySpan Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 554.0 ns 2.76 ns 2.58 ns 553.3 ns 549.5 ns 557.8 ns 1.17 0.01 - - - -
Memory Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 520.6 ns 6.19 ns 5.79 ns 522.0 ns 504.5 ns 528.1 ns 1.00 0.00 - - - -
Memory Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 573.6 ns 6.42 ns 5.69 ns 572.9 ns 563.9 ns 587.1 ns 1.10 0.02 - - - -
ReadOnlyMemory Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 489.6 ns 9.34 ns 10.76 ns 492.8 ns 455.5 ns 500.1 ns 1.00 0.00 - - - -
ReadOnlyMemory Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 574.8 ns 3.51 ns 3.28 ns 575.3 ns 569.5 ns 581.8 ns 1.18 0.03 - - - -
List Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 444.1 ns 4.83 ns 4.52 ns 442.8 ns 437.9 ns 455.7 ns 1.00 0.00 - - - -
List Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 541.4 ns 4.29 ns 3.58 ns 540.7 ns 536.9 ns 548.4 ns 1.22 0.02 - - - -
ImmutableArray Job-LFQMVW .NET Core 3.1 netcoreapp3.1 2048 656.6 ns 3.69 ns 3.45 ns 655.6 ns 651.0 ns 662.6 ns 1.00 0.00 - - - -
ImmutableArray Job-GJJUUG .NET Core 5.0 netcoreapp5.0 2048 442.7 ns 3.16 ns 2.95 ns 442.3 ns 437.2 ns 447.5 ns 0.67 0.00 - - - -

@danmoseley
Copy link
Member

Note the Array.Copy regression was in November (#39112)

*Memory, *Span use Buffer.Memmove
List, Array use Array.Copy which uses Buffer.BulkMoveWithWriteBarrier not Memmove (since I assume pMT->ContainsGCPointers is true)

@jkotas thoughts about this one, since we were discussing earlier Array.Copy change? Also, why is it OK for Span to always use Memmove?

@danmoseley
Copy link
Member

Also #37808 is about a regression (on Ubuntu) for 1000 element Buffer.BlockCopy which seems to be some validation then a call to Buffer.Memmove.

@jkotas
Copy link
Member

jkotas commented Aug 3, 2020

why is it OK for Span to always use Memmove?

Span uses generic version of Memmove that calls non-generic Memmove or BulkMoveWithWriteBarrier for CoreCLR. The logic is same for both Span and Array:


if (!RuntimeHelpers.IsReferenceOrContainsReferences<T>())

@jkotas
Copy link
Member

jkotas commented Aug 3, 2020

@jkotas thoughts about this one

Memory copying performance depends a lot on alignment of the source and destination arrays. In the case of object references that we have here, it also depends on the generation that the destination array lives in. I do not see the benchmarks controlling for these variables. It makes the benchmark many modal by construction.

Notice that the trend changed in January, but it has several flips to baseline in February. It confirms the hypothesis.

@jkotas
Copy link
Member

jkotas commented Aug 3, 2020

Here is a small program that demonstrates the many modal nature of memory copying:

using System;
using System.Diagnostics;

class Program
{
    static void Work(ReadOnlySpan<string> from, Span<string> to)
    {
        for (int i = 0; i < 1000000; i++) from.CopyTo(to);
    }

    static void Main(string[] args)
    {
        Random r = new Random();
        for (;;)
        {
            var sw = new Stopwatch();
            GC.KeepAlive(new byte[r.Next(32)]);
            var from = new string[2048];
            GC.KeepAlive(new byte[r.Next(32)]);
            var to = new string[2048];
            if (r.Next(10) == 0) GC.Collect();
            sw.Start();
            Work(from, to);       
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

On my machine, it shows that the performance is oscilating between ~460 and ~800.

@danmoseley
Copy link
Member

What do the GC.KeepAlive(new byte[r.Next(32)]); lines achieve? Is your sample causing either buffer to get promoted?

@jkotas
Copy link
Member

jkotas commented Aug 3, 2020

GC.KeepAlive(new byte[r.Next(32)]); will make string arrays aligned randomly. It assumes that objects are allocated in memory next to each other (internal implementation details of the current allocators).

if (r.Next(10) == 0) GC.Collect(); should move the buffers out of Gen0. Maybe that line can be if (r.Next(10) == 0) for (int i = 0; i < 5; i++) GC.Collect(); to make sure that these buffers are really moved to Gen2.

@danmoseley
Copy link
Member

OK thanks I think next action here for us is to make changes like that in the benchmarks and remeasure 3.1 and 5.0

@DrewScoggins
Copy link
Member Author

I agree. This looks like a place where the test is not measuring the right thing.

Also to answer your questions Dan:

These 3.1->5.0 comparisons are weird because they span the coreclr/corefx->runtime switch. As a result I never really built in a way to do commit level comparisons across the repos, but I would say that what you did was correct. It would be hard to go bak and get any more data points to try and bound this, but it would be possible. Going forward we should not have this problem as we have more reliable ways of triggering runs out the installer repo, and running perf tests in the runtime repo directly.

@adamsitnik
Copy link
Member

GC.KeepAlive(new byte[r.Next(32)]); will make string arrays aligned randomly.

Nice "trick"! I wonder if BenchmarkDotNet should do something like this between the iterations to try to ensure more random memory alignments (cc @AndreyAkinshin @billwert )

@billwert
Copy link
Member

billwert commented Aug 4, 2020

Nice "trick"! I wonder if BenchmarkDotNet should do something like this between the iterations to try to ensure more random memory alignments

Would be an interesting experiment! Perhaps we should take some random sample of benchmarks (1/type?) and see what stability this gives us over repeated runs compared to not doing it?

@danmoseley
Copy link
Member

danmoseley commented Aug 4, 2020

Does BDN have an opportunity to do this? In this case it needs to influence the alignment of arrays that are allocated within the setup method:

        [GlobalSetup]
        public void Setup()
        {
            _array = ValuesGenerator.ArrayOfUniqueValues<T>(Size);
            _list = new List<T>(_array);
            _immutablearray = Immutable.ImmutableArray.CreateRange(_array);
            _destination = new T[Size];
        }

If I'm understanding correctly, the GC.KeepAlive(new byte[r.Next(32)]); needs to be sprinkled within there. But the allocations are within the benchmark itself perhaps BDN could help as you propose.

@jeffhandley
Copy link
Member

@layomia @eiriktsarpalis Should we move this to 6.0.0?

@eiriktsarpalis
Copy link
Member

@jeffhandley Based on the comments above I think it makes sense to move.

@eiriktsarpalis eiriktsarpalis modified the milestones: 5.0.0, 6.0.0 Sep 21, 2020
@danmoseley danmoseley modified the milestones: 6.0.0, Future Jul 14, 2021
@eiriktsarpalis
Copy link
Member

We might need to check if the regression remains after dotnet/BenchmarkDotNet#1513 got addressed. I tried finding more recent benchmark results for the particular benchmark but I can't seem to be able to navigate to that via the reporting system.

@ghost
Copy link

ghost commented Nov 12, 2021

This issue has been automatically marked no recent activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no recent activity.

@DrewScoggins
Copy link
Member Author

@eiriktsarpalis If you ever want to look at historical data, the best way to do that is to go to https://aka.ms/HistoryIndex. From there you can click on the link to the configuration you want to look at, and the search in the index page for the test of interest.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info no-recent-activity labels Nov 16, 2021
@eiriktsarpalis
Copy link
Member

Thanks @DrewScoggins, I was not aware of that link. Is it documented somewhere, or is it possible to include as a link in regression reports?

@eiriktsarpalis
Copy link
Member

Assuming the benchmark hardware is the same, it would seem like the regressions have been resolved. That being said, the List case seems to be displaying frequent spikes in performance since July 2021:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Collections.CopyTo(String).List(Size%3a%202048).html

It would seem to me though that this is unrelated to the root cause that prompting the filing of this original issue, so wondering if we should just close it?

@adamsitnik
Copy link
Member

The CopyTo benchmarks are a perfect example of benchmarks heavily dependent on the memory alignment. We have 4 options:

  • both buffers are aligned
  • source buffer is aligned, destination is not
  • destination buffer is aligned, source is not
  • none of the buffers are aligned

Which gives us 3 modes visible in the charts: all are aligned, only one is aligned, none are aligned.

This is why I've used it as an example when introducing the Memory Randomization feature to BDN.

so wondering if we should just close it?

I agree. 👍

@DrewScoggins
Copy link
Member Author

Thanks @DrewScoggins, I was not aware of that link. Is it documented somewhere, or is it possible to include as a link in regression reports?

It has not been documented as of yet, as it is kind of hacky :) However, I can start putting that link into the reports, with a little description of how to use it.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-System.Collections needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

9 participants