Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrentQueueSegment allows spinning threads to sleep. #44265

Merged
merged 4 commits into from
Nov 5, 2020

Conversation

alexcovington
Copy link
Contributor

Proposal to fix this issue. The SpinWait instances in ConcurrentQueueSegment does not allow enqueuers/dequeuers to sleep when there is contention, causing a lot of time spent busy waiting.

This change would essentially undo this merge and reimplements a threshold value for the spinners. Originally I used the Thread.OptimalMaxSpinWaitsPerSpinIteration as the threshold value, which improved throughput on my EPYC machine significantly, but I've found that using a constant value of 8 results in very similar performance, which Thread.OptimalMaxSpinWaitsPerSpinIteration usually evaluates to anyways.

I ran both the Microbenchmark and TechEmpower benchmarks to evaluate the impact of the change. Here are the results for my Ryzen and Skylake machines:

Microbenchmarks

Base - Current master branch
Diff - This change

Ryzen

summary:
better: 2, geomean: 5.838
worse: 1, geomean: 1.020
total diff: 3

| Slower                                                              | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.IterateForEach<Int32>.ConcurrentQueue(Size: 512) |      1.02 |          4394.97 |          4483.57 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.Concurrent.AddRemoveFromSameThreads<String>.ConcurrentQueue(S |      6.82 |     676441708.50 |      99189562.00 |         |
| System.Collections.Concurrent.AddRemoveFromSameThreads<Int32>.ConcurrentQueue(Si |      5.00 |     608823740.50 |     121816517.50 |         |

Skylake

summary:
better: 2, geomean: 2.179
worse: 1, geomean: 1.045
total diff: 3

| Slower                                                               | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.IterateForEach<String>.ConcurrentQueue(Size: 512) |      1.05 |          5536.67 |          5787.78 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.Concurrent.AddRemoveFromSameThreads<Int32>.ConcurrentQueue(Si |      2.23 |     188272565.50 |      84290311.00 |         |
| System.Collections.Concurrent.AddRemoveFromSameThreads<String>.ConcurrentQueue(S |      2.13 |     191395707.50 |      90050370.50 |         |

TechEmpower

Default - Current master branch
x.x.8 - Sleep threshold value of 8

Plaintext

Ryzen

| application           | zen2-lin.plaintext.default | zen2-lin.plaintext.8 |        |
| --------------------- | -------------------------- | -------------------- | ------ |
| CPU Usage (%)         |                         51 |                   50 | -1.96% |
| Raw CPU Usage (%)     |                     606.53 |               601.99 | -0.75% |
| Working Set (MB)      |                        109 |                  112 | +2.75% |
| Build Time (ms)       |                      3,790 |                3,812 | +0.58% |
| Start Time (ms)       |                        218 |                  207 | -5.05% |
| Published Size (KB)   |                     94,124 |               94,124 |  0.00% |
| .NET Core SDK Version |        5.0.100-rtm.20509.5 |  5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.plaintext.default | zen2-lin.plaintext.8 |         |
| ---------------------- | -------------------------- | -------------------- | ------- |
| CPU Usage (%)          |                         51 |                   50 |  -1.96% |
| Raw CPU Usage (%)      |                     607.74 |               597.67 |  -1.66% |
| Working Set (MB)       |                         51 |                   52 |  +1.96% |
| Build Time (ms)        |                      3,673 |                3,829 |  +4.25% |
| Start Time (ms)        |                          0 |                    0 |         |
| Published Size (KB)    |                     76,404 |               76,404 |   0.00% |
| .NET Core SDK Version  |                    3.1.403 |              3.1.403 |         |
| Requests/sec           |                  5,837,436 |            5,840,076 |  +0.05% |
| Requests               |                 88,120,000 |           88,179,648 |  +0.07% |
| Mean latency (ms)      |                       6.72 |                 5.84 | -13.10% |
| Max latency (ms)       |                     354.35 |                83.95 | -76.31% |
| Bad responses          |                          0 |                    0 |         |
| Socket errors          |                          0 |                    0 |         |
| Read throughput (MB/s) |                     701.44 |               701.76 |  +0.05% |
| Latency 50th (ms)      |                       3.31 |                 3.32 |  +0.30% |
| Latency 75th (ms)      |                       8.46 |                 8.34 |  -1.42% |
| Latency 90th (ms)      |                      15.52 |                14.74 |  -5.03% |
| Latency 99th (ms)      |                      48.95 |                30.92 | -36.83% |

Skylake

| application         | skylake-lin.plaintext.default | skylake-lin.plaintext.8 |         |
| ------------------- | ----------------------------- | ----------------------- | ------- |
| CPU Usage (%)       |                            52 |                      52 |   0.00% |
| Raw CPU Usage (%)   |                        413.76 |                  414.74 |  +0.24% |
| Working Set (MB)    |                            93 |                     113 | +21.51% |
| Build Time (ms)     |                         2,816 |                   2,843 |  +0.96% |
| Start Time (ms)     |                           250 |                     247 |  -1.20% |
| Published Size (KB) |                        94,187 |                  94,187 |   0.00% |


| load                   | skylake-lin.plaintext.default | skylake-lin.plaintext.8 |         |
| ---------------------- | ----------------------------- | ----------------------- | ------- |
| CPU Usage (%)          |                            49 |                      49 |   0.00% |
| Raw CPU Usage (%)      |                        393.67 |                  391.69 |  -0.50% |
| Working Set (MB)       |                            51 |                      51 |   0.00% |
| Build Time (ms)        |                         3,186 |                   3,185 |  -0.03% |
| Start Time (ms)        |                             0 |                       0 |         |
| Published Size (KB)    |                        76,404 |                  76,404 |   0.00% |
| First Request (ms)     |                            66 |                      62 |  -6.06% |
| Requests/sec           |                     2,735,736 |               2,721,966 |  -0.50% |
| Requests               |                    41,299,696 |              41,096,432 |  -0.49% |
| Mean latency (ms)      |                          9.21 |                   10.83 | +17.59% |
| Max latency (ms)       |                        220.10 |                  293.38 | +33.29% |
| Bad responses          |                             0 |                       0 |         |
| Socket errors          |                             0 |                       0 |         |
| Read throughput (MB/s) |                        328.73 |                  327.08 |  -0.50% |
| Latency 50th (ms)      |                          5.22 |                    5.84 | +11.88% |
| Latency 75th (ms)      |                         12.70 |                   13.91 |  +9.53% |
| Latency 90th (ms)      |                         21.52 |                   25.28 | +17.47% |
| Latency 99th (ms)      |                         53.23 |                   75.93 | +42.65% |

Fortunes

Ryzen

| db                | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |         |
| ----------------- | ------------------------- | ------------------- | ------- |
| CPU Usage (%)     |                        26 |                  27 |  +3.85% |
| Raw CPU Usage (%) |                    314.99 |              319.26 |  +1.36% |
| Working Set (MB)  |                       103 |                  98 |  -4.85% |
| Build Time (ms)   |                     4,794 |               4,305 | -10.20% |
| Start Time (ms)   |                     2,297 |               1,655 | -27.95% |


| application           | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |        |
| --------------------- | ------------------------- | ------------------- | ------ |
| CPU Usage (%)         |                        51 |                  51 |  0.00% |
| Raw CPU Usage (%)     |                    609.16 |              607.95 | -0.20% |
| Working Set (MB)      |                       346 |                 348 | +0.58% |
| Build Time (ms)       |                     3,732 |               3,752 | +0.54% |
| Start Time (ms)       |                    11,869 |              11,879 | +0.08% |
| Published Size (KB)   |                    94,126 |              94,126 |  0.00% |
| .NET Core SDK Version |       5.0.100-rtm.20509.5 | 5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |        |
| ---------------------- | ------------------------- | ------------------- | ------ |
| CPU Usage (%)          |                        16 |                  16 |  0.00% |
| Raw CPU Usage (%)      |                    190.99 |              193.67 | +1.41% |
| Working Set (MB)       |                        51 |                  52 | +1.96% |
| Build Time (ms)        |                     3,783 |               3,683 | -2.64% |
| Start Time (ms)        |                         0 |                   0 |        |
| Published Size (KB)    |                    76,404 |              76,404 |  0.00% |
| .NET Core SDK Version  |                   3.1.403 |             3.1.403 |        |
| First Request (ms)     |                       132 |                 124 | -6.06% |
| Requests/sec           |                   145,481 |             146,284 | +0.55% |
| Requests               |                 2,193,818 |           2,207,425 | +0.62% |
| Mean latency (ms)      |                      3.55 |                3.52 | -0.85% |
| Max latency (ms)       |                     27.79 |               26.16 | -5.87% |
| Bad responses          |                         0 |                   0 |        |
| Socket errors          |                         0 |                   0 |        |
| Read throughput (MB/s) |                    188.83 |              189.87 | +0.55% |
| Latency 50th (ms)      |                      3.34 |                3.32 | -0.60% |
| Latency 75th (ms)      |                      4.13 |                4.10 | -0.73% |
| Latency 90th (ms)      |                      5.06 |                5.03 | -0.59% |
| Latency 99th (ms)      |                      8.28 |                8.10 | -2.17% |

Skylake

| db                | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |        |
| ----------------- | ---------------------------- | ---------------------- | ------ |
| CPU Usage (%)     |                           30 |                     30 |  0.00% |
| Raw CPU Usage (%) |                       237.31 |                 238.85 | +0.65% |
| Working Set (MB)  |                           83 |                     78 | -6.02% |
| Build Time (ms)   |                        1,672 |                  1,688 | +0.96% |
| Start Time (ms)   |                          325 |                    302 | -7.08% |


| application         | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |        |
| ------------------- | ---------------------------- | ---------------------- | ------ |
| CPU Usage (%)       |                           48 |                     48 |  0.00% |
| Raw CPU Usage (%)   |                       382.07 |                 381.20 | -0.23% |
| Working Set (MB)    |                          186 |                    180 | -3.23% |
| Build Time (ms)     |                        2,807 |                  2,822 | +0.53% |
| Start Time (ms)     |                       11,321 |                 11,310 | -0.10% |
| Published Size (KB) |                       94,189 |                 94,189 |  0.00% |


| load                   | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |        |
| ---------------------- | ---------------------------- | ---------------------- | ------ |
| CPU Usage (%)          |                           18 |                     19 | +5.56% |
| Raw CPU Usage (%)      |                       146.53 |                 148.22 | +1.15% |
| Working Set (MB)       |                           51 |                     51 |  0.00% |
| Build Time (ms)        |                        3,163 |                  3,182 | +0.60% |
| Start Time (ms)        |                            0 |                      0 |        |
| Published Size (KB)    |                       76,404 |                 76,404 |  0.00% |
| First Request (ms)     |                           75 |                     76 | +1.33% |
| Requests/sec           |                       79,003 |                 78,921 | -0.10% |
| Requests               |                    1,192,522 |              1,191,434 | -0.09% |
| Mean latency (ms)      |                         6.49 |                   6.51 | +0.31% |
| Max latency (ms)       |                        43.32 |                  46.11 | +6.44% |
| Bad responses          |                            0 |                      0 |        |
| Socket errors          |                            0 |                      0 |        |
| Read throughput (MB/s) |                       102.54 |                 102.44 | -0.10% |
| Latency 50th (ms)      |                         6.17 |                   6.15 | -0.32% |
| Latency 75th (ms)      |                         7.55 |                   7.58 | +0.40% |
| Latency 90th (ms)      |                         9.09 |                   9.18 | +0.99% |
| Latency 99th (ms)      |                        13.71 |                  14.16 | +3.28% |

Json

Ryzen

| application           | zen2-lin.json.default | zen2-lin.json.8     |        |
| --------------------- | --------------------- | ------------------- | ------ |
| CPU Usage (%)         |                    59 |                  58 | -1.69% |
| Raw CPU Usage (%)     |                703.46 |              691.37 | -1.72% |
| Working Set (MB)      |                   267 |                 273 | +2.25% |
| Build Time (ms)       |                 3,742 |               3,761 | +0.51% |
| Start Time (ms)       |                   208 |                 208 |  0.00% |
| Published Size (KB)   |                94,124 |              94,124 |  0.00% |
| .NET Core SDK Version |   5.0.100-rtm.20509.5 | 5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.json.default | zen2-lin.json.8 |         |
| ---------------------- | --------------------- | --------------- | ------- |
| CPU Usage (%)          |                    42 |              42 |   0.00% |
| Raw CPU Usage (%)      |                504.94 |          501.70 |  -0.64% |
| Working Set (MB)       |                    51 |              51 |   0.00% |
| Build Time (ms)        |                 3,788 |           3,729 |  -1.56% |
| Start Time (ms)        |                     0 |               0 |         |
| Published Size (KB)    |                76,404 |          76,404 |   0.00% |
| .NET Core SDK Version  |               3.1.403 |         3.1.403 |         |
| Requests/sec           |               558,740 |         549,529 |  -1.65% |
| Requests               |             8,436,538 |       8,296,629 |  -1.66% |
| Mean latency (ms)      |                  4.63 |            2.96 | -36.07% |
| Max latency (ms)       |                412.12 |           91.91 | -77.70% |
| Bad responses          |                     0 |               0 |         |
| Socket errors          |                     0 |               0 |         |
| Read throughput (MB/s) |                 77.80 |           76.51 |  -1.66% |
| Latency 50th (ms)      |                  0.54 |            0.51 |  -4.81% |
| Latency 75th (ms)      |                  3.30 |            3.29 |  -0.30% |
| Latency 90th (ms)      |                 10.52 |            9.00 | -14.45% |
| Latency 99th (ms)      |                 71.37 |           25.65 | -64.06% |

Skylake

| application         | skylake-lin.json.default | skylake-lin.json.8 |        |
| ------------------- | ------------------------ | ------------------ | ------ |
| CPU Usage (%)       |                       58 |                 59 | +1.72% |
| Raw CPU Usage (%)   |                   467.18 |             468.79 | +0.34% |
| Working Set (MB)    |                      166 |                167 | +0.60% |
| Build Time (ms)     |                    2,820 |              2,775 | -1.60% |
| Start Time (ms)     |                      247 |                247 |  0.00% |
| Published Size (KB) |                   94,187 |             94,187 |  0.00% |


| load                   | skylake-lin.json.default | skylake-lin.json.8 |         |
| ---------------------- | ------------------------ | ------------------ | ------- |
| CPU Usage (%)          |                       51 |                 44 | -13.73% |
| Raw CPU Usage (%)      |                   410.26 |             353.84 | -13.75% |
| Working Set (MB)       |                       50 |                 51 |  +2.00% |
| Build Time (ms)        |                    3,314 |              3,147 |  -5.04% |
| Start Time (ms)        |                        0 |                  0 |         |
| Published Size (KB)    |                   76,404 |             76,404 |   0.00% |
| First Request (ms)     |                       60 |                 60 |   0.00% |
| Requests/sec           |                  243,422 |            250,537 |  +2.92% |
| Requests               |                3,671,284 |          3,779,742 |  +2.95% |
| Mean latency (ms)      |                    51.96 |              10.66 | -79.48% |
| Max latency (ms)       |                 1,990.00 |             550.29 | -72.35% |
| Bad responses          |                        0 |                  0 |         |
| Socket errors          |                       74 |                  0 |         |
| Read throughput (MB/s) |                    33.89 |              34.88 |  +2.92% |
| Latency 50th (ms)      |                     1.14 |               1.20 |  +5.26% |
| Latency 75th (ms)      |                    17.00 |               7.48 | -56.00% |
| Latency 90th (ms)      |                   151.25 |              22.24 | -85.30% |
| Latency 99th (ms)      |                   777.46 |             163.01 | -79.03% |

This change is more significant on high-core count CPUs and impacts my EPYC machine the most, but I cannot post the numbers publicly. Please let me know if anyone would like to review them and I can send an internal email with results.

Please let me know if I can clarify or expand on any of the above.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@stephentoub
Copy link
Member

stephentoub commented Nov 5, 2020

Originally I used the Thread.OptimalMaxSpinWaitsPerSpinIteration as the threshold value, which improved throughput on my EPYC machine significantly, but I've found that using a constant value of 8 results in very similar performance, which Thread.OptimalMaxSpinWaitsPerSpinIteration usually evaluates to anyways.

What about just using SpinWait() with no argument? If it's in the same ballpark, my preference would be to not introduce another magic number. Or maybe subsequently even tweak the number it uses internally.

@alexcovington
Copy link
Contributor Author

What about just using SpinWait() with no argument? If it's in the same ballpark, my preference would be to not introduce another magic number. Or maybe subsequently even tweak the number it uses internally.

That's a good point.

No argument is still an improvement and is in the same ballpark for Skylake and Ryzen, but the improvement isn't as great for EPYC when passing a threshold. EPYC sees an increase ~150-200% more requests/sec with a threshold and an increase of ~80% more requests/sec with no argument. So I'd prefer to pass the threshold value, if possible.

Here are the Skylake and Ryzen numbers comparing current release (default), sleep threshold of 8, and no sleep threshold argument (noarg):

Ryzen


| db                | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |         | zen2-lin.fortunes.noarg |         |
| ----------------- | ------------------------- | ------------------- | ------- | ----------------------- | ------- |
| CPU Usage (%)     |                        26 |                  26 |   0.00% |                      26 |   0.00% |
| Raw CPU Usage (%) |                    314.09 |              314.35 |  +0.08% |                  312.37 |  -0.55% |
| Working Set (MB)  |                       102 |                  99 |  -2.94% |                     105 |  +2.94% |
| Build Time (ms)   |                    14,016 |               4,729 | -66.26% |                   5,685 | -59.44% |
| Start Time (ms)   |                     1,656 |               1,848 | +11.59% |                   1,981 | +19.63% |


| application           | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |        | zen2-lin.fortunes.noarg |        |
| --------------------- | ------------------------- | ------------------- | ------ | ----------------------- | ------ |
| CPU Usage (%)         |                        51 |                  51 |  0.00% |                      51 |  0.00% |
| Raw CPU Usage (%)     |                    609.21 |              611.24 | +0.33% |                  609.80 | +0.10% |
| Working Set (MB)      |                       346 |                 346 |  0.00% |                     343 | -0.87% |
| Build Time (ms)       |                     4,011 |               4,121 | +2.74% |                   4,152 | +3.52% |
| Start Time (ms)       |                    11,830 |              11,900 | +0.59% |                  11,908 | +0.66% |
| Published Size (KB)   |                    94,126 |              94,126 |  0.00% |                  94,126 |  0.00% |
| .NET Core SDK Version |       5.0.100-rtm.20509.5 | 5.0.100-rtm.20509.5 |        |     5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.fortunes.default | zen2-lin.fortunes.8 |         | zen2-lin.fortunes.noarg |         |
| ---------------------- | ------------------------- | ------------------- | ------- | ----------------------- | ------- |
| CPU Usage (%)          |                        16 |                  17 |  +6.25% |                      16 |   0.00% |
| Raw CPU Usage (%)      |                    191.23 |              198.93 |  +4.03% |                  192.87 |  +0.86% |
| Working Set (MB)       |                        51 |                  51 |   0.00% |                      51 |   0.00% |
| Build Time (ms)        |                     4,016 |               3,964 |  -1.29% |                   3,940 |  -1.89% |
| Start Time (ms)        |                         0 |                   0 |         |                       0 |         |
| Published Size (KB)    |                    76,404 |              76,404 |   0.00% |                  76,404 |   0.00% |
| .NET Core SDK Version  |                   3.1.403 |             3.1.403 |         |                 3.1.403 |         |
| First Request (ms)     |                       240 |                 211 | -12.08% |                     193 | -19.58% |
| Requests/sec           |                   146,123 |             146,327 |  +0.14% |                 145,995 |  -0.09% |
| Requests               |                 2,206,261 |           2,208,632 |  +0.11% |               2,202,596 |  -0.17% |
| Mean latency (ms)      |                      3.52 |                3.52 |   0.00% |                    3.52 |   0.00% |
| Max latency (ms)       |                     30.08 |               24.87 | -17.32% |                   27.15 |  -9.74% |
| Bad responses          |                         0 |                   0 |         |                       0 |         |
| Socket errors          |                         0 |                   0 |         |                       0 |         |
| Read throughput (MB/s) |                    189.66 |              189.93 |  +0.14% |                  189.49 |  -0.09% |
| Latency 50th (ms)      |                      3.32 |                3.31 |  -0.30% |                    3.32 |   0.00% |
| Latency 75th (ms)      |                      4.11 |                4.10 |  -0.24% |                    4.10 |  -0.24% |
| Latency 90th (ms)      |                      5.04 |                5.02 |  -0.40% |                    5.02 |  -0.40% |
| Latency 99th (ms)      |                      8.15 |                8.00 |  -1.84% |                    8.14 |  -0.12% |


| application           | zen2-lin.json.default | zen2-lin.json.8     |        | zen2-lin.json.noarg |        |
| --------------------- | --------------------- | ------------------- | ------ | ------------------- | ------ |
| CPU Usage (%)         |                    59 |                  57 | -3.39% |                  58 | -1.69% |
| Raw CPU Usage (%)     |                704.26 |              689.67 | -2.07% |              698.22 | -0.86% |
| Working Set (MB)      |                   276 |                 261 | -5.43% |                 263 | -4.71% |
| Build Time (ms)       |                 4,121 |               4,073 | -1.16% |               4,058 | -1.53% |
| Start Time (ms)       |                   214 |                 207 | -3.27% |                 209 | -2.34% |
| Published Size (KB)   |                94,124 |              94,124 |  0.00% |              94,124 |  0.00% |
| .NET Core SDK Version |   5.0.100-rtm.20509.5 | 5.0.100-rtm.20509.5 |        | 5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.json.default | zen2-lin.json.8 |         | zen2-lin.json.noarg |         |
| ---------------------- | --------------------- | --------------- | ------- | ------------------- | ------- |
| CPU Usage (%)          |                    42 |              43 |  +2.38% |                  43 |  +2.38% |
| Raw CPU Usage (%)      |                509.52 |          511.49 |  +0.39% |              515.00 |  +1.07% |
| Working Set (MB)       |                    51 |              51 |   0.00% |                  51 |   0.00% |
| Build Time (ms)        |                 4,371 |           3,968 |  -9.22% |               3,844 | -12.06% |
| Start Time (ms)        |                     0 |               0 |         |                   0 |         |
| Published Size (KB)    |                76,404 |          76,404 |   0.00% |              76,404 |   0.00% |
| .NET Core SDK Version  |               3.1.403 |         3.1.403 |         |             3.1.403 |         |
| Requests/sec           |               542,750 |         536,089 |  -1.23% |             553,632 |  +2.01% |
| Requests               |             8,193,820 |       8,093,817 |  -1.22% |           8,358,526 |  +2.01% |
| Mean latency (ms)      |                  5.14 |            6.03 | +17.32% |                5.10 |  -0.78% |
| Max latency (ms)       |                318.43 |          337.03 |  +5.84% |              370.67 | +16.41% |
| Bad responses          |                     0 |               0 |         |                   0 |         |
| Socket errors          |                     0 |               0 |         |                   0 |         |
| Read throughput (MB/s) |                 75.57 |           74.64 |  -1.23% |               77.09 |  +2.01% |
| Latency 50th (ms)      |                  0.56 |            0.54 |  -3.22% |                0.52 |  -6.98% |
| Latency 75th (ms)      |                  3.60 |            4.14 | +15.00% |                3.38 |  -6.11% |
| Latency 90th (ms)      |                 11.57 |           14.26 | +23.25% |                9.69 | -16.25% |
| Latency 99th (ms)      |                 77.42 |           87.64 | +13.20% |               87.93 | +13.58% |


| application           | zen2-lin.plaintext.default | zen2-lin.plaintext.8 |        | zen2-lin.plaintext.noarg |        |
| --------------------- | -------------------------- | -------------------- | ------ | ------------------------ | ------ |
| CPU Usage (%)         |                         51 |                   50 | -1.96% |                       51 |  0.00% |
| Raw CPU Usage (%)     |                     607.45 |               605.76 | -0.28% |                   617.90 | +1.72% |
| Working Set (MB)      |                        111 |                  113 | +1.80% |                      111 |  0.00% |
| Build Time (ms)       |                      4,159 |                4,115 | -1.06% |                    4,051 | -2.60% |
| Start Time (ms)       |                        215 |                  210 | -2.33% |                      216 | +0.47% |
| Published Size (KB)   |                     94,124 |               94,124 |  0.00% |                   94,124 |  0.00% |
| .NET Core SDK Version |        5.0.100-rtm.20509.5 |  5.0.100-rtm.20509.5 |        |      5.0.100-rtm.20509.5 |        |


| load                   | zen2-lin.plaintext.default | zen2-lin.plaintext.8 |         | zen2-lin.plaintext.noarg |         |
| ---------------------- | -------------------------- | -------------------- | ------- | ------------------------ | ------- |
| CPU Usage (%)          |                         50 |                   50 |   0.00% |                       50 |   0.00% |
| Raw CPU Usage (%)      |                     595.72 |               596.92 |  +0.20% |                   596.46 |  +0.12% |
| Working Set (MB)       |                         51 |                   51 |   0.00% |                       51 |   0.00% |
| Build Time (ms)        |                      3,980 |                4,229 |  +6.26% |                    3,807 |  -4.35% |
| Start Time (ms)        |                          0 |                    0 |         |                        0 |         |
| Published Size (KB)    |                     76,404 |               76,404 |   0.00% |                   76,404 |   0.00% |
| .NET Core SDK Version  |                    3.1.403 |              3.1.403 |         |                  3.1.403 |         |
| Requests/sec           |                  5,949,761 |            5,820,989 |  -2.16% |                5,850,003 |  -1.68% |
| Requests               |                 89,824,864 |           87,896,496 |  -2.15% |               88,332,048 |  -1.66% |
| Mean latency (ms)      |                       5.90 |                 5.98 |  +1.36% |                     5.72 |  -3.05% |
| Max latency (ms)       |                     177.43 |                89.76 | -49.41% |                    81.83 | -53.88% |
| Bad responses          |                          0 |                    0 |         |                        0 |         |
| Socket errors          |                          0 |                    0 |         |                        0 |         |
| Read throughput (MB/s) |                     714.94 |               699.47 |  -2.16% |                   702.95 |  -1.68% |
| Latency 50th (ms)      |                       3.15 |                 3.39 |  +7.62% |                     3.31 |  +5.08% |
| Latency 75th (ms)      |                       8.00 |                 8.51 |  +6.37% |                     8.16 |  +2.00% |
| Latency 90th (ms)      |                      14.31 |                15.12 |  +5.66% |                    14.28 |  -0.21% |
| Latency 99th (ms)      |                      35.06 |                31.93 |  -8.93% |                    30.17 | -13.95% |

Skylake


| db                | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |         | skylake-lin.fortunes.noarg |         |
| ----------------- | ---------------------------- | ---------------------- | ------- | -------------------------- | ------- |
| CPU Usage (%)     |                           30 |                     30 |   0.00% |                         30 |   0.00% |
| Raw CPU Usage (%) |                       240.73 |                 240.35 |  -0.16% |                     238.67 |  -0.86% |
| Working Set (MB)  |                           94 |                     79 | -15.96% |                         75 | -20.21% |
| Build Time (ms)   |                       10,805 |                  1,970 | -81.77% |                      2,530 | -76.58% |
| Start Time (ms)   |                          284 |                    290 |  +2.11% |                        327 | +15.14% |


| application         | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |         | skylake-lin.fortunes.noarg |         |
| ------------------- | ---------------------------- | ---------------------- | ------- | -------------------------- | ------- |
| CPU Usage (%)       |                           53 |                     48 |  -9.43% |                         50 |  -5.66% |
| Raw CPU Usage (%)   |                       424.63 |                 381.04 | -10.27% |                     397.74 |  -6.33% |
| Working Set (MB)    |                          232 |                    158 | -31.90% |                        157 | -32.33% |
| Build Time (ms)     |                        2,831 |                  2,836 |  +0.18% |                      2,875 |  +1.55% |
| Start Time (ms)     |                       11,313 |                 11,332 |  +0.17% |                     11,335 |  +0.19% |
| Published Size (KB) |                       94,189 |                 94,189 |   0.00% |                     94,189 |   0.00% |


| load                   | skylake-lin.fortunes.default | skylake-lin.fortunes.8 |         | skylake-lin.fortunes.noarg |         |
| ---------------------- | ---------------------------- | ---------------------- | ------- | -------------------------- | ------- |
| CPU Usage (%)          |                           18 |                     19 |  +5.56% |                         18 |   0.00% |
| Raw CPU Usage (%)      |                       145.01 |                 148.26 |  +2.24% |                     145.70 |  +0.47% |
| Working Set (MB)       |                           51 |                     51 |   0.00% |                         51 |   0.00% |
| Build Time (ms)        |                        3,290 |                  3,940 | +19.76% |                      3,191 |  -3.01% |
| Start Time (ms)        |                            0 |                      0 |         |                          0 |         |
| Published Size (KB)    |                       76,404 |                 76,404 |   0.00% |                     76,404 |   0.00% |
| First Request (ms)     |                           78 |                     78 |   0.00% |                         72 |  -7.69% |
| Requests/sec           |                       78,988 |                 78,507 |  -0.61% |                     78,661 |  -0.41% |
| Requests               |                    1,192,482 |              1,185,236 |  -0.61% |                  1,187,889 |  -0.39% |
| Mean latency (ms)      |                         6.49 |                   6.55 |  +0.92% |                       6.52 |  +0.46% |
| Max latency (ms)       |                        50.48 |                  43.40 | -14.03% |                      36.68 | -27.34% |
| Bad responses          |                            0 |                      0 |         |                          0 |         |
| Socket errors          |                            0 |                      0 |         |                          0 |         |
| Read throughput (MB/s) |                       102.52 |                 101.90 |  -0.60% |                     102.10 |  -0.41% |
| Latency 50th (ms)      |                         6.14 |                   6.20 |  +0.98% |                       6.17 |  +0.49% |
| Latency 75th (ms)      |                         7.59 |                   7.61 |  +0.26% |                       7.60 |  +0.13% |
| Latency 90th (ms)      |                         9.17 |                   9.22 |  +0.55% |                       9.20 |  +0.33% |
| Latency 99th (ms)      |                        14.03 |                  14.04 |  +0.07% |                      14.15 |  +0.86% |


| application         | skylake-lin.json.default | skylake-lin.json.8 |         | skylake-lin.json.noarg |        |
| ------------------- | ------------------------ | ------------------ | ------- | ---------------------- | ------ |
| CPU Usage (%)       |                       58 |                 60 |  +3.45% |                     59 | +1.72% |
| Raw CPU Usage (%)   |                   465.45 |             476.77 |  +2.43% |                 471.27 | +1.25% |
| Working Set (MB)    |                      158 |                175 | +10.76% |                    162 | +2.53% |
| Build Time (ms)     |                    2,830 |              2,857 |  +0.95% |                  2,848 | +0.64% |
| Start Time (ms)     |                      247 |                248 |  +0.40% |                    247 |  0.00% |
| Published Size (KB) |                   94,187 |             94,187 |   0.00% |                 94,187 |  0.00% |


| load                   | skylake-lin.json.default | skylake-lin.json.8 |         | skylake-lin.json.noarg |         |
| ---------------------- | ------------------------ | ------------------ | ------- | ---------------------- | ------- |
| CPU Usage (%)          |                       44 |                 44 |   0.00% |                     44 |   0.00% |
| Raw CPU Usage (%)      |                   349.05 |             352.25 |  +0.92% |                 355.11 |  +1.74% |
| Working Set (MB)       |                       51 |                 51 |   0.00% |                     51 |   0.00% |
| Build Time (ms)        |                    3,242 |              3,173 |  -2.13% |                  3,148 |  -2.90% |
| Start Time (ms)        |                        0 |                  0 |         |                      0 |         |
| Published Size (KB)    |                   76,404 |             76,404 |   0.00% |                 76,404 |   0.00% |
| First Request (ms)     |                       60 |                 60 |   0.00% |                     60 |   0.00% |
| Requests/sec           |                  255,745 |            252,760 |  -1.17% |                256,222 |  +0.19% |
| Requests               |                3,857,226 |          3,814,477 |  -1.11% |              3,867,955 |  +0.28% |
| Mean latency (ms)      |                    10.92 |               8.83 | -19.14% |                   9.10 | -16.67% |
| Max latency (ms)       |                   714.80 |             763.29 |  +6.78% |                 447.89 | -37.34% |
| Bad responses          |                        0 |                  0 |         |                      0 |         |
| Socket errors          |                        0 |                  0 |         |                      0 |         |
| Read throughput (MB/s) |                    35.61 |              35.19 |  -1.18% |                  35.68 |  +0.20% |
| Latency 50th (ms)      |                     1.15 |               1.17 |  +1.74% |                   1.17 |  +1.74% |
| Latency 75th (ms)      |                     7.32 |               6.81 |  -6.97% |                   7.24 |  -1.09% |
| Latency 90th (ms)      |                    21.23 |              17.30 | -18.51% |                  19.73 |  -7.07% |
| Latency 99th (ms)      |                   185.57 |             135.06 | -27.22% |                 136.09 | -26.66% |


| application         | skylake-lin.plaintext.default | skylake-lin.plaintext.8 |        | skylake-lin.plaintext.noarg |        |
| ------------------- | ----------------------------- | ----------------------- | ------ | --------------------------- | ------ |
| CPU Usage (%)       |                            53 |                      53 |  0.00% |                          51 | -3.77% |
| Raw CPU Usage (%)   |                        422.68 |                  420.34 | -0.55% |                      409.07 | -3.22% |
| Working Set (MB)    |                            94 |                      91 | -3.19% |                          87 | -7.45% |
| Build Time (ms)     |                         2,826 |                   2,843 | +0.60% |                       2,816 | -0.35% |
| Start Time (ms)     |                           248 |                     250 | +0.81% |                         250 | +0.81% |
| Published Size (KB) |                        94,187 |                  94,187 |  0.00% |                      94,187 |  0.00% |


| load                   | skylake-lin.plaintext.default | skylake-lin.plaintext.8 |         | skylake-lin.plaintext.noarg |          |
| ---------------------- | ----------------------------- | ----------------------- | ------- | --------------------------- | -------- |
| CPU Usage (%)          |                            55 |                      50 |  -9.09% |                          56 |   +1.82% |
| Raw CPU Usage (%)      |                        440.85 |                  396.26 | -10.12% |                      446.75 |   +1.34% |
| Working Set (MB)       |                            51 |                      51 |   0.00% |                          51 |    0.00% |
| Build Time (ms)        |                         3,222 |                   3,186 |  -1.12% |                       3,190 |   -0.99% |
| Start Time (ms)        |                             0 |                       0 |         |                           0 |          |
| Published Size (KB)    |                        76,404 |                  76,404 |   0.00% |                      76,404 |    0.00% |
| First Request (ms)     |                            59 |                      60 |  +1.69% |                          60 |   +1.69% |
| Requests/sec           |                     2,664,402 |               2,716,003 |  +1.94% |                   2,650,739 |   -0.51% |
| Requests               |                    40,224,336 |              41,009,872 |  +1.95% |                  40,023,744 |   -0.50% |
| Mean latency (ms)      |                         29.26 |                   10.45 | -64.29% |                      128.09 | +337.76% |
| Max latency (ms)       |                      1,170.00 |                  222.80 | -80.96% |                    1,700.00 |  +45.30% |
| Bad responses          |                             0 |                       0 |         |                           0 |          |
| Socket errors          |                             0 |                       0 |         |                           0 |          |
| Read throughput (MB/s) |                        320.16 |                  326.36 |  +1.94% |                      318.52 |   -0.51% |
| Latency 50th (ms)      |                          6.92 |                    5.73 | -17.20% |                       15.22 | +119.94% |
| Latency 75th (ms)      |                         17.34 |                   13.77 | -20.59% |                      146.55 | +745.16% |
| Latency 90th (ms)      |                         49.47 |                   24.38 | -50.72% |                      455.16 | +820.07% |
| Latency 99th (ms)      |                        460.41 |                   68.21 | -85.18% |                      991.58 | +115.37% |

alexcovington and others added 2 commits November 5, 2020 09:52
…ncurrent/ConcurrentQueueSegment.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
@stephentoub
Copy link
Member

Thanks, @alexcovington. @kouvel, this makes me wonder if need to revisit

internal const int DefaultSleep1Threshold = 20; // After how many yields should we Sleep(1) frequently?

e.g. whether it should be the same "8" that @alexcovington has landed on here...? We can certainly check in the 8 that's used here, but such magic values being thrown around do make me a little nervous.

@kouvel
Copy link
Member

kouvel commented Nov 5, 2020

this makes me wonder if need to revisit

internal const int DefaultSleep1Threshold = 20; // After how many yields should we Sleep(1) frequently?

e.g. whether it should be the same "8" that @alexcovington has landed on here...?

Possibly, but that would involve a lot more testing on anything that uses the SpinWait struct on large-core-count and multi-NUMA node systems. The current number was determined at the same time the spin-wait scheme was changed, it's quite possible that it could use adjusting.

@kouvel
Copy link
Member

kouvel commented Nov 5, 2020

We can certainly check in the 8 that's used here, but such magic values being thrown around do make me a little nervous.

I'm not sure that one number would work best for everything. I had to tweak spin counts for different cases before depending on how expensive the following wait is, it could also vary based on how the data structure would be used and how much it would contend.

@stephentoub
Copy link
Member

stephentoub commented Nov 5, 2020

Ok. Let's get this in but then follow-up. We use -1 in as well in ConcurrentStack, BlockingCollection, ManualResetEventSlim, SemaphoreSlim, SpinLock, Barrier, CountdownEvent, and Task:
https://source.dot.net/#System.Private.CoreLib/SpinWait.cs,e030659599d0fa3f,references
It seems likely that if -1 was inappropriate in ConcurrentQueue that it's also inappropriate in one or more of those.

@alexcovington, is this something you'd be interested in helping with? If not, totally fine, just figured I'd ask :)

@kouvel
Copy link
Member

kouvel commented Nov 5, 2020

Mainly if a proper wait follows the spin-wait, then there wouldn't be much benefit in doing Sleep(1), I think some of those fall into that category where avoiding the sleep is probably ok

@stephentoub stephentoub merged commit ce4772d into dotnet:master Nov 5, 2020
@alexcovington
Copy link
Contributor Author

@stephentoub I'd be happy to help 😄. Just let me know how I can contribute.

@stephentoub
Copy link
Member

stephentoub commented Nov 5, 2020

Thanks, @alexcovington. I think the work would "just" be to look at the other uses of SpinWait.SpinOnce(-1) (you can see all of them here: https://source.dot.net/#System.Private.CoreLib/SpinWait.cs,e030659599d0fa3f,references) and decide if any should be changed to either SpinOnce() or SpinOnce(someOtherValue). We could also look at existing uses of the parameterless SpinOnce (https://source.dot.net/#System.Private.CoreLib/SpinWait.cs,39bd72970cc926fe,references), though that seems less important given how much closer that was to the ideal throughput in your tests of ConcurrentQueue. As @kouvel says, some of them are probably fine as is, but I expect some might warrant a change, e.g. ConcurrentStack's usage is similar to ConcurrentQueue's. It's also fine to decide everything is good the way it is, or the difference is negligible enough to not be worth the effort. I just see us changing one usage and want to make sure we've at least thought about the others and whether they're relevant.

@alexcovington
Copy link
Contributor Author

@stephentoub Makes sense. I'll start poking around and will post a new issue if I find anything.

@adamsitnik adamsitnik added this to the 6.0.0 milestone Nov 6, 2020
tqiu8 pushed a commit to tqiu8/runtime that referenced this pull request Nov 9, 2020
author Stephen Toub <stoub@microsoft.com> 1604601164 -0500
committer Tammy Qiu <tammy.qiu@yahoo.com> 1604960878 -0500

Add stream conformance tests for TranscodingStream (dotnet#44248)

* Add stream conformance tests for TranscodingStream

* Special-case 0-length input buffers to TranscodingStream.Write{Async}

The base implementation of Encoder.Convert doesn't like empty inputs.  Regardless, if the input is empty, we can avoid a whole bunch of unnecessary work.

JIT: minor inliner refactoring (dotnet#44215)

Extract out the budget check logic so it can vary by inlining policy.
Use this to exempt the FullPolicy from budget checking.

Fix inline xml to dump the proper (full name) hash for inlinees.

Update range dumper to dump ranges in hex.

Remove unused QCall for WinRTSupported (dotnet#44278)

ConcurrentQueueSegment allows spinning threads to sleep. (dotnet#44265)

* Allow threads to sleep when ConcurrentQueue has many enqueuers/dequeuers.

* Update src/libraries/System.Private.CoreLib/src/System/Collections/Concurrent/ConcurrentQueueSegment.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>

* Apply suggestions from code review

Co-authored-by: Stephen Toub <stoub@microsoft.com>

Co-authored-by: AMD DAYTONA EPYC <amd@amd-DAYTONA-X0.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>

File.Exists() is not null when true (dotnet#44310)

* File.Exists() is not null when true

* Fix compile

* Fix compile 2

[master][watchOS] Add simwatch64 support (dotnet#44303)

Xcode 12.2 removed 32 bits support for watchOS simulators, this PR helps to fix xamarin/xamarin-macios#9949, we have tested the new binaries and they are working as expected

![unknown](https://user-images.githubusercontent.com/204671/98253709-64413200-1f49-11eb-9774-8c5aa416fc57.png)

Co-authored-by: dalexsoto <dalexsoto@users.noreply.github.com>

Implementing support to Debugger::Break. (dotnet#44305)

Set fgOptimizedFinally flag correctly (dotnet#44268)

- Initialize to 0 at compiler startup
- Set flag when finally cloning optimization kicks in

Fixes non-deterministic generation of nop opcodes into ARM32 code

Forbid `- byref cnst` -> `+ (byref -cnst)` transformation. (dotnet#44266)

* Add a repro test.

* Forbid the transformation for byrefs.

* Update src/coreclr/src/jit/morph.cpp

Co-authored-by: Andy Ayers <andya@microsoft.com>

* Update src/coreclr/src/jit/morph.cpp

* Fix the test return value.

WriteLine is just to make sure we don't delete the value.

* improve the test.

avoid a possible overflow and don't waste time on printing.

Co-authored-by: Andy Ayers <andya@microsoft.com>

Pick libmonosgen-2.0.so from cmake install directory instead of .libs (dotnet#44291)

This aligns Linux with what we already do for all the other platforms.

Update SharedPerformanceCounter assert (dotnet#44333)

Remove silly ToString in GetCLRInstanceString (dotnet#44335)

Use targetPlatformMoniker for net5.0 and newer tfms (dotnet#43965)

* Use targetPlatformMoniker for net5.0 and newer tfms

* disabling analyzer, update version to 0.0, and use new format.

* update the targetFramework.sdk

* removing supportedOS assembly level attribute

* fix linker errors and addressing feedback

* making _TargetFrameworkWithoutPlatform as private

[sgen] Add Ward annotations to sgen_get_total_allocated_bytes (dotnet#43833)

Attempt to fix https://jenkins.mono-project.com/job/test-mono-mainline-staticanalysis/

Co-authored-by: lambdageek <lambdageek@users.noreply.github.com>

[tests] Re-enable tests fixed by dotnet#44081 (dotnet#44212)

Fixes
mono/mono#15030 and
fixes mono/mono#15031 and
fixes mono/mono#15032

Add an implicit argument coercion check. (dotnet#43386)

* Add `impCheckImplicitArgumentCoercion`.

* Fix tests with type mismatch.

* Try to fix VM signature.

* Allow to pass byref as native int.

* another fix.

* Fix another IL test.

[mono] Change CMakelists.txt "python" -> Python3_EXECUTABLE (dotnet#44340)

Debian doesn't install a "python" binary for python3.

Tweak StreamConformanceTests for cancellation (dotnet#44342)

- Avoid unnecessary timers
- Separate tests for precancellation, ReadAsync(byte[], ...) cancellation, and ReadAsync(Memory, ...) cancellation

Use Dictionary for underlying cache of ResourceSet (dotnet#44104)

Simplify catch-rethrow logic in NetworkStream (dotnet#44246)

A follow-up on dotnet#40772 (comment), simplifies and harmonizes the way we wrap exceptions into IOException. Having one catch block working with System.Exception seems to be enough here, no need for specific handling of SocketException.

Simple GT_NEG optimization for dotnet#13837 (dotnet#43921)

* Simple arithmetic optimization with GT_NEG

* Skip GT_NEG optimization when an operand is constant. Revert bitwise rotation pattern

* Fixed Value Numbering assert

* Cleaned up code and comments for simple GT_NEG optimization

* Formatting

Co-authored-by: Julie Lee <jeonlee@microsoft.com>

[master] Update dependencies from mono/linker (dotnet#44322)

* Update dependencies from https://github.com/mono/linker build 20201105.1

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.1

* Update dependencies from https://github.com/mono/linker build 20201105.2

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20527.2 -> To Version 6.0.0-alpha.1.20555.2

* Disable new optimization for libraries mode (it cannot work in this mode)

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Marek Safar <marek.safar@gmail.com>

Tighten argument validation in StreamConformanceTests (dotnet#44326)

Add threshold on number of files / partition in SPMI collection (dotnet#44180)

* Add check for files count

* Fix the OS check

* decrese file limit to 1500:

* misc fix

* Do not upload to azure if mch files are zero size

Fix ELT profiler tests (dotnet#44285)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu (dotnet#44336)

[master] Update dependencies from dotnet/arcade dotnet/llvm-project dotnet/icu

 - Merge branch 'master' into darc-master-2211df94-2a02-4c3c-abe1-e3534e896267

Fix Send_TimeoutResponseContent_Throws (dotnet#44356)

If the client times out too quickly, the server may never have a connection to accept and will hang forever.

Match CoreCLR behaviour on thread start failure (dotnet#44124)

Co-authored-by: Aleksey Kliger (λgeek) <akliger@gmail.com>

Add slash in Windows SoD tool build (dotnet#44359)

* Add slash in Windows SoD tool build

* Update SoD search path to match output dir

* Fixup dotnet version

* Remove merge commit headers

* Disable PRs

Co-authored-by: Drew Scoggins <andrew.g.scoggins@gmail>

Reflect test path changes in .gitattributes; remove nonexistent files (dotnet#44371)

Bootstrapping a test for R2RDump (dotnet#42150)

Improve performance of Enum's generic IsDefined / GetName / GetNames (dotnet#44355)

Eliminates the boxing in IsDefined/GetName/GetValues, and in GetNames avoids having to go through RuntimeType's GetEnumNames override.

clarify http version test (dotnet#44379)

Co-authored-by: Geoffrey Kizer <geoffrek@windows.microsoft.com>

Update dependencies from https://github.com/mono/linker build 20201106.1 (dotnet#44367)

Microsoft.NET.ILLink.Tasks
 From Version 6.0.0-alpha.1.20555.2 -> To Version 6.0.0-alpha.1.20556.1

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

Disable RunThreadLocalTest8_Values on Mono (dotnet#44357)

* Disable RunThreadLocalTest8_Values on Mono

It's failing on SLES

* fix typo

LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script (dotnet#44299)

* LongProcessNamesAreSupported: make test work on distros where sleep is a symlink/script

* PR feedback

Co-authored-by: Stephen Toub <stoub@microsoft.com>

* fix compilation

Co-authored-by: Stephen Toub <stoub@microsoft.com>

add missing constructor overloads (dotnet#44380)

Co-authored-by: Geoffrey Kizer <geoffrek@windows.microsoft.com>

change using in ConnectCallback_UseUnixDomainSocket_Success (dotnet#44366)

Clean up the samples (dotnet#44293)

Update dotnet/roslyn issue link

Delete stale comment about dotnet/roslyn#30797

Fix/remove TODO-NULLABLEs (dotnet#44300)

* Fix/remove TODO-NULLABLEs

* remove redundant !

* apply Jozkee's feedback

* address feedback

Update glossary (dotnet#44274)

Co-authored-by: Juan Hoyos <juan.hoyos@microsoft.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>

Add files need for wasm executable relinking/aot to the wasm runtime pack. (dotnet#43785)

Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>

Move some more UnmanagedCallersOnly tests to IL now that they're invalid C# (dotnet#43366)

Fix C++ build for mono/metadata/threads.c (dotnet#44413)

`throw` is a reserved keyword in C++.

Disable a failing test. (dotnet#44404)

Change async void System.Text.Json test to be async Task (dotnet#44418)

Improve crossgen2 comparison jobs (dotnet#44119)

- Fix compilation on unix platforms
  - Wrap use of wildcard in quotes
- Print better display name into log
- Fix X86 constant comparison handling
- Add ability to compile specific overload via single method switches

Remove some unnecessary GetTypeInfo usage (dotnet#44414)

Fix MarshalTypedArrayByte and re-enable it. Re-enable TestFunctionApply
kouvel added a commit to kouvel/runtime that referenced this pull request Dec 16, 2020
- dotnet#44265 seems to have caused large regressions on Windows and Linux-arm64. During that change we had tested adding the `Sleep(1)` to some `ConcurrentQueue` operations in contending cases, and not spin-waiting at all in forward-progressing cases. Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part (compared with spin-waiting without `Sleep(1)`), so I have removed spin-waiting in forward-progressing cases in `ConcurrentQueue`.
- There were some regressions from the portable thread pool on Windows. I have moved/tweaked a slight delay that I had added early on, after changes thereafter it lost its intention, with the changes it goes back to the original intention and seems to resolve some of the gap, but maybe not all of it in some tests. We'll check the graphs after this change and see if there is more to investigate. There are also other things to improve on Windows, and many of those may be separate from the portable thread pool but some may be relevant to the changes in perf characteristics.
kouvel added a commit that referenced this pull request Dec 16, 2020
- #44265 seems to have caused large regressions on Windows and Linux-arm64. During that change we had tested adding the `Sleep(1)` to some `ConcurrentQueue` operations in contending cases, and not spin-waiting at all in forward-progressing cases. Not spin-waiting at all where possible in contending cases seemed to be better or equal for the most part (compared with spin-waiting without `Sleep(1)`), so I have removed spin-waiting in forward-progressing cases in `ConcurrentQueue`.
- There were some regressions from the portable thread pool on Windows. I have moved/tweaked a slight delay that I had added early on, after changes thereafter it lost its intention, with the changes it goes back to the original intention and seems to resolve some of the gap, but maybe not all of it in some tests. We'll check the graphs after this change and see if there is more to investigate. There are also other things to improve on Windows, and many of those may be separate from the portable thread pool but some may be relevant to the changes in perf characteristics.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConcurrentQueue spending excess time in SpinWait
6 participants