Skip to content
This repository has been archived by the owner on Aug 8, 2024. It is now read-only.

[mono] limit DegreeOfParallelism to 16 #369

Merged
merged 1 commit into from
Nov 4, 2019
Merged

Conversation

lewurm
Copy link

@lewurm lewurm commented Oct 31, 2019

We started to see the System.Core-xunit step on CI to hit the timeout of 15 minutes with Linux/ARM64. That was weird, because the step used to be completed in around two minutes. With my local device (jetson board) I wasn't able to reproduce it either; it took around 100s there. We then realized it's specific to the new taishan CI machines, which are equipped with 64 cores. Hardcoding mono_cpu_count to return 16 restored the performance, however that isn't a viable fix.

Limiting DefaultDegreeOfParallelism to 16 fixes it, which is less extreme than limiting mono_cpu_count (), still not ideal though. It seems to boil down to the fact that our non-netcore threadpool implementation doesn't handle a large number of cores well.

repro.cs, extracted from here https://github.com/dotnet/corefx/blob/a9b91e205a8794327a028cb4b29953127f0f194c/src/System.Linq.Parallel/tests/QueryOperators/ConcatTests.cs#L145-L154

using System;
using System.Linq;
using System.Collections.Generic;
using System.Threading;

public class Repro {
    public static void Main (string []args) {
        const int ElementCount = 2048;
        ParallelQuery<int> leftQuery = ParallelEnumerable.Range(0, ElementCount / 4).Union(ParallelEnumerable.Range(ElementCount / 4, ElementCount / 4));
        ParallelQuery<int> rightQuery = ParallelEnumerable.Range(2 * ElementCount / 4, ElementCount / 4).Union(ParallelEnumerable.Range(3 * ElementCount / 4, ElementCount / 4));

        var results = new HashSet<int>(leftQuery.Concat(rightQuery));
        Console.WriteLine ("results.Count=" + results.Count + ", ElementCount=" + ElementCount);
    }
}

Before fix:

$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048

real    0m5.846s
user    0m0.344s
sys     0m1.929s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 536.005s

With this fix:

$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048

real    0m1.247s
user    0m0.206s
sys     0m0.225s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s

/cc @akoeplinger @lambdageek @filipnavara

lewurm added a commit to lewurm/mono that referenced this pull request Oct 31, 2019
On top of mono/corefx#369 this improves the execution time of `System.Core-xunit` on Linux/ARM64 by 2x, so from:

```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
to
```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 74.636s
```

This is only relevant for non-netcore. The CoreCLR folks just recently fixed something similar (thanks to Marek sharing this link): dotnet/coreclr#27543
akoeplinger pushed a commit to mono/mono that referenced this pull request Oct 31, 2019
On top of mono/corefx#369 this improves the execution time of `System.Core-xunit` on Linux/ARM64 by 2x, so from:

```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
to
```console
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 74.636s
```

This is only relevant for non-netcore. The CoreCLR folks just recently fixed something similar (thanks to Marek sharing this link): dotnet/coreclr#27543
We started to see the `System.Core-xunit` step on CI to hit the timeout of 15 minutes with Linux/ARM64. That was weird, because the step used to be completed in around two minutes. With my local device (jetson board) I wasn't able to reproduce it either; it took around 100s there. We then realized it's specific to the new `taishan` CI machines, which are equipped with 64 cores. Hardcoding `mono_cpu_count` to return 16 restored the performance, however that isn't a viable fix.

Limiting `DefaultDegreeOfParallelism` to 16 fixes it, which is less extreme than limiting `mono_cpu_count ()`, still not ideal though. It seems to boil down to the fact that our non-netcore threadpool implementation doesn't handle a large number of cores well.

`repro.cs`, extracted from here https://github.com/dotnet/corefx/blob/a9b91e205a8794327a028cb4b29953127f0f194c/src/System.Linq.Parallel/tests/QueryOperators/ConcatTests.cs#L145-L154
```csharp
using System;
using System.Linq;
using System.Collections.Generic;
using System.Threading;

public class Repro {
    public static void Main (string []args) {
        const int ElementCount = 2048;
        ParallelQuery<int> leftQuery = ParallelEnumerable.Range(0, ElementCount / 4).Union(ParallelEnumerable.Range(ElementCount / 4, ElementCount / 4));
        ParallelQuery<int> rightQuery = ParallelEnumerable.Range(2 * ElementCount / 4, ElementCount / 4).Union(ParallelEnumerable.Range(3 * ElementCount / 4, ElementCount / 4));

        var results = new HashSet<int>(leftQuery.Concat(rightQuery));
        Console.WriteLine ("results.Count=" + results.Count + ", ElementCount=" + ElementCount);
    }
}
```

Before fix:
```console
$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048

real    0m5.846s
user    0m0.344s
sys     0m1.929s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 536.005s
```

With this fix:
```console
$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048

real    0m1.247s
user    0m0.206s
sys     0m0.225s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
   net_4_x_System.Core_xunit-test  Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
@akoeplinger akoeplinger merged commit c44efe7 into mono:master Nov 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants