[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

simonrozsival · 2022-09-08T17:31:09Z

This PR

disables failing tests [wasm-mt] WasmTestOnBrowser-System.Net.Http.Functional.Tests fail #74411, [wasm-mt] WasmTestOnBrowser-System.Net.WebSockets.Client.Tests fails #74413, and [wasm-mt] The System.Threading.Tasks.Dataflow.Tests TransformManyBlockTests.TestProducerConsumerAsyncEnumerable fails on Browser when multi-threading is enabled #75389
fixes [wasm-ep] Wasm.Browser.EventPipe.Sample fails #74487
increases pthread pool size for CI from the default 4 to 16
updates the browser-eventpipe sample to make it CI friendly

ghost · 2022-09-08T17:31:17Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR

disables failing tests [wasm-mt] WasmTestOnBrowser-System.Net.Http.Functional.Tests fail #74411 and [wasm-mt] WasmTestOnBrowser-System.Net.WebSockets.Client.Tests fails #74413
fixes [wasm-ep] Wasm.Browser.EventPipe.Sample fails #74487

Author:	simonrozsival
Assignees:	-
Labels:	`arch-wasm`, `area-Build-mono`
Milestone:	-

simonrozsival · 2022-09-08T17:31:19Z

/azp run runtime-wasm

azure-pipelines · 2022-09-08T17:31:35Z

Azure Pipelines successfully started running 1 pipeline(s).

simonrozsival · 2022-09-09T08:02:44Z

Tests that previously passed are now failing:

WasmTestOnBrowser-Microsoft.Extensions.DependencyInjection.Tests
WasmTestOnBrowser-System.Security.Cryptography.Tests
WasmTestOnBrowser-System.Threading.Tasks.Dataflow.Tests

This PR will need some additional work to fix all the remaining failing tests in the multi-threaded lane. I wonder why it started failing now and how it's connected to the disabled tests.

simonrozsival · 2022-09-09T08:05:03Z

Sample stack trace of some failing System.Threading.Tasks.Dataflow.Tests:

fail: [FAIL] System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree

[19:08:25] fail: [FAIL] System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree
[19:08:25] info: System.Threading.Tasks.TaskSchedulerException : An exception was thrown by a TaskScheduler.
[19:08:25] info: ---- System.Threading.ThreadStartException : Thread failed to start.
[19:08:25] info: -------- System.ExecutionEngineException : mono_thread_platform_create_thread() failed
[19:08:25] info:    at System.Threading.Tasks.Task.ScheduleAndStart(Boolean needsProtection)
[19:08:25] info:    at System.Threading.Tasks.Task.Start(TaskScheduler scheduler)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.Common.StartTaskSafe(Task task, TaskScheduler scheduler)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessary_Slow(Boolean isReplacementReplica, Boolean outgoingLockKnownAcquired)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessary(Boolean isReplacementReplica, Boolean outgoingLockKnownAcquired)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessaryWithValueLock()
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].AddMessage(String item)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.BufferBlock`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].System.Threading.Tasks.Dataflow.ITargetBlock.OfferMessage(DataflowMessageHeader messageHeader, String messageValue, ISourceBlock`1 source, Boolean consumeToAccept)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.DataflowBlock.Post[String](ITargetBlock`1 target, String item)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.DataflowTestHelpers.PostRange[String](ITargetBlock`1 target, Int32 lowerBoundInclusive, Int32 upperBoundExclusive, Func`2 selector)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.CreateFillLink[String](Int32 messages, ITargetBlock`1 target)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree()
[19:08:25] info: --- End of stack trace from previous location ---
[19:08:25] info: ----- Inner Stack Trace -----
[19:08:25] info:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] info:    at System.Threading.Thread.StartCore()
[19:08:25] info:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] info:    at System.Threading.Thread.UnsafeStart()
[19:08:25] info:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] info:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] info:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] info:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] info:    at System.Threading.ThreadPoolWorkQueue.Enqueue(Object callback, Boolean forceGlobal)
[19:08:25] info:    at System.Threading.ThreadPool.UnsafeQueueUserWorkItemInternal(Object callBack, Boolean preferLocal)
[19:08:25] info:    at System.Threading.Tasks.ThreadPoolTaskScheduler.QueueTask(Task task)
[19:08:25] info:    at System.Threading.Tasks.TaskScheduler.InternalQueueTask(Task task)
[19:08:25] info:    at System.Threading.Tasks.Task.ScheduleAndStart(Boolean needsProtection)
[19:08:25] info: ----- Inner Stack Trace -----
[19:08:25] info: 
[19:08:25] warn: 
[19:08:25] warn: Unhandled Exception:
[19:08:25] warn: System.Threading.ThreadStartException: Thread failed to start.
[19:08:25] warn:  ---> System.ExecutionEngineException: mono_thread_platform_create_thread() failed
[19:08:25] warn:    --- End of inner exception stack trace ---
[19:08:25] warn:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] warn:    at System.Threading.Thread.StartCore()
[19:08:25] warn:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] warn:    at System.Threading.Thread.UnsafeStart()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] warn:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] warn:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] warn:    at System.Threading.ThreadPoolWorkQueue.Dispatch()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[19:08:25] warn:    at System.Threading.Thread.StartCallback()
[19:08:25] warn: [ERROR] FATAL UNHANDLED EXCEPTION: System.Threading.ThreadStartException: Thread failed to start.
[19:08:25] warn:  ---> System.ExecutionEngineException: mono_thread_platform_create_thread() failed
[19:08:25] warn:    --- End of inner exception stack trace ---
[19:08:25] warn:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] warn:    at System.Threading.Thread.StartCore()
[19:08:25] warn:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] warn:    at System.Threading.Thread.UnsafeStart()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] warn:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] warn:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] warn:    at System.Threading.ThreadPoolWorkQueue.Dispatch()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[19:08:25] warn:    at System.Threading.Thread.StartCallback()

This is the same failure I'm getting locally when I set "pthreadPoolSize": 0 in mono-config.json (that happens if the test project isn't published with -p:WasmEnableThreads=true). It also happens when the pthread pool is too small and we run out of workers in the pool. We use the default 4 pthread pool workers and from my local testing System.Threading.Tasks.Dataflow.Tests needs at least 8 thread pool workers. I'll try bumping up the pool size and see if that's enough to solve the issue for now.

Also the process timeouts without completing properly even with the JS synchronization context we have:

[19:23:18] fail: Tests timed out. Killing driver service pid 25854
[19:23:18] fail: Application has finished with exit code TIMED_OUT but 0 was expected
XHarness exit code: 71 (GENERAL_FAILURE)

…-ep-fix-remaining-failing-tests

simonrozsival · 2022-09-09T11:35:25Z

/azp run runtime-wasm

azure-pipelines · 2022-09-09T11:35:36Z

Azure Pipelines successfully started running 1 pipeline(s).

simonrozsival · 2022-09-09T13:33:48Z

/azp run runtime-wasm

azure-pipelines · 2022-09-09T13:34:03Z

Azure Pipelines successfully started running 1 pipeline(s).

…e mode on CI

simonrozsival · 2022-09-10T07:23:16Z

/azp run runtime-wasm

azure-pipelines · 2022-09-10T07:23:29Z

Azure Pipelines successfully started running 1 pipeline(s).

simonrozsival · 2022-09-10T10:46:04Z

/azp run runtime-wasm

simonrozsival · 2022-09-12T10:47:05Z

/azp run runtime-wasm

azure-pipelines · 2022-09-12T10:47:17Z

Azure Pipelines successfully started running 1 pipeline(s).

This reverts commit 1e5c88e.

simonrozsival · 2022-09-12T12:36:40Z

/azp run runtime-wasm

azure-pipelines · 2022-09-12T12:36:56Z

Azure Pipelines successfully started running 1 pipeline(s).

lambdageek · 2022-09-12T14:04:21Z

@kg is it reasonable to use 16 web workers for CI? Are we testing something that will never happen in real code?

kg · 2022-09-12T19:41:34Z

16 is definitely too much. In practice I don't think you'll see more than 8 real threads available on regular user machines anytime soon, and you could end up being limited to way less based on core count. (I don't know how the browser decides on the limit).

It makes sense to run some tests with high and low counts to test those scenarios though.

lambdageek · 2022-09-13T02:32:57Z

@simonrozsival It would be good to understand what the maximum required degree of parallelism is for those dataflow tests - ie: do they really need to run at least 8 threads in parallel, or is the threadpool just spinning up as many threads as it can because there are a lot of async tasks.

If the dataflow tests really have a high degree of required parallelism, we should make an issue to make simplified tests for threaded wasm. If it's the threadpool, we should make an issue to make it understand that thread creation on wasm can fail sometimes and try to recover.

I don't think we should throw as many workers as we can at a test until it is passing, if we're past what a regular desktop browser would support.

simonrozsival · 2022-09-13T06:01:35Z

@lambdageek OK, I'll look deeper into the codebase

simonrozsival · 2022-09-13T12:18:58Z

/azp run runtime-wasm

azure-pipelines · 2022-09-13T12:19:15Z

Azure Pipelines successfully started running 1 pipeline(s).

simonrozsival · 2022-09-13T12:26:39Z

@lambdageek The problem seems to be JoinBlockTests. I tried disabling parallelization for that class but that wasn't reliable and the tests still fail very often. It should be enough to bump the pthread pool size to just 8 instead of 16. Would that be acceptable? If not, I think we would have to disable JoinBlockTests (or at least part of it).

lambdageek · 2022-09-13T12:57:56Z

@simonrozsival TestFree looks pretty busy, does the stability of the rest of the testsuite improve if you just disable that one?

Let's go with 8 and create an issue to look at these tests again.

simonrozsival · 2022-09-13T15:50:19Z

@lambdageek when that particular test is disabled, it sometimes succeeds, and sometimes it gets stuck. I tried finding a subset of JoinBlockTests that we could disable to get the tests passing reliably, but I wasn't successful. It might not be the only test class that is causing the problem after all.

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

lambdageek · 2022-09-13T20:00:02Z

@lambdageek when that particular test is disabled, it sometimes succeeds, and sometimes it gets stuck. I tried finding a subset of JoinBlockTests that we could disable to get the tests passing reliably, but I wasn't successful. It might not be the only test class that is causing the problem after all.

Ok, fair enough. Let's go with 8 workers for now.

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

I think the tests got rewritten using the new template which uses the wasm app host which I think knows when to add the additional headers on its own

…ing-failing-tests

…se-cop argument

radical · 2022-09-14T00:43:14Z

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

This got dropped in 4486805 as it moved to building a custom RunScriptCommand, but missed referencing WasmXHarnessArgs which is getting that argument.

radical · 2022-09-14T00:44:31Z

When we feel confident about these, the tests can be enabled outside runtime-wasm. And we can stop ignoring the test failures.

simonrozsival · 2022-09-14T07:01:11Z

/azp run runtime-wasm

azure-pipelines · 2022-09-14T07:01:30Z

Azure Pipelines successfully started running 1 pipeline(s).

simonrozsival · 2022-09-14T11:47:34Z

The Build Browser wasm Linux Release LibraryTests_Threading and Build Browser wasm Linux Release LibraryTests_Threading_PerfTracing legs are now green.

simonrozsival added 3 commits September 8, 2022 18:00

Disable failing tests that make HTTP requests

538b12e

Fix eventpipe sample test

b2c2e48

Add missing issue links

e5352d1

simonrozsival added arch-wasm WebAssembly architecture area-Build-mono labels Sep 8, 2022

ghost assigned simonrozsival Sep 8, 2022

simonrozsival changed the title ~~[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading perf tracing enabled~~ [wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled Sep 8, 2022

simonrozsival added 3 commits September 9, 2022 11:58

Merge branch 'main' of https://github.com/dotnet/runtime into wasm-mt…

5e3e4b9

…-ep-fix-remaining-failing-tests

Bump pthread pool size for tests

b8682fe

Disable failing test

e620901

Increase pthread pool size for CI

9e2b82e

Try using diagnostics mock for the browser-eventpipe sample in Releas…

a410b00

…e mode on CI

Update active issue link

865fdd0

simonrozsival marked this pull request as ready for review September 10, 2022 10:46

simonrozsival requested review from radical, lewing and pavelsavara as code owners September 10, 2022 10:46

TMP add debugging information for CI

1e5c88e

Revert "TMP add debugging information for CI"

a867cca

This reverts commit 1e5c88e.

lambdageek approved these changes Sep 12, 2022

View reviewed changes

Decrease pthread pool size to just 8

e5690d8

radical added 2 commits September 14, 2022 00:34

Merge remote-tracking branch 'origin/main' into wasm-mt-ep-fix-remain…

d3fcc47

…ing-failing-tests

[wasm] samples: use WasmXHarnessArgs so we can get the --web-server-u…

a40d262

…se-cop argument

simonrozsival mentioned this pull request Sep 14, 2022

[wasm-mt] Revisit the size of pthread pool for CI #75602

Closed

simonrozsival merged commit 9d8be44 into dotnet:main Sep 14, 2022

simonrozsival deleted the wasm-mt-ep-fix-remaining-failing-tests branch September 14, 2022 11:47

ghost locked as resolved and limited conversation to collaborators Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

simonrozsival commented Sep 8, 2022 •

edited

Loading

ghost commented Sep 8, 2022

simonrozsival commented Sep 8, 2022

azure-pipelines bot commented Sep 8, 2022

simonrozsival commented Sep 9, 2022

simonrozsival commented Sep 9, 2022 •

edited

Loading

simonrozsival commented Sep 9, 2022

azure-pipelines bot commented Sep 9, 2022

simonrozsival commented Sep 9, 2022

azure-pipelines bot commented Sep 9, 2022

simonrozsival commented Sep 10, 2022

azure-pipelines bot commented Sep 10, 2022

simonrozsival commented Sep 10, 2022

simonrozsival commented Sep 12, 2022

azure-pipelines bot commented Sep 12, 2022

simonrozsival commented Sep 12, 2022

azure-pipelines bot commented Sep 12, 2022

lambdageek commented Sep 12, 2022

kg commented Sep 12, 2022

lambdageek commented Sep 13, 2022 •

edited

Loading

simonrozsival commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

azure-pipelines bot commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

lambdageek commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

lambdageek commented Sep 13, 2022

radical commented Sep 14, 2022

radical commented Sep 14, 2022

simonrozsival commented Sep 14, 2022

azure-pipelines bot commented Sep 14, 2022

simonrozsival commented Sep 14, 2022

[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

Conversation

simonrozsival commented Sep 8, 2022 • edited Loading

ghost commented Sep 8, 2022

simonrozsival commented Sep 8, 2022

azure-pipelines bot commented Sep 8, 2022

simonrozsival commented Sep 9, 2022

simonrozsival commented Sep 9, 2022 • edited Loading

simonrozsival commented Sep 9, 2022

azure-pipelines bot commented Sep 9, 2022

simonrozsival commented Sep 9, 2022

azure-pipelines bot commented Sep 9, 2022

simonrozsival commented Sep 10, 2022

azure-pipelines bot commented Sep 10, 2022

simonrozsival commented Sep 10, 2022

simonrozsival commented Sep 12, 2022

azure-pipelines bot commented Sep 12, 2022

simonrozsival commented Sep 12, 2022

azure-pipelines bot commented Sep 12, 2022

lambdageek commented Sep 12, 2022

kg commented Sep 12, 2022

lambdageek commented Sep 13, 2022 • edited Loading

simonrozsival commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

azure-pipelines bot commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

lambdageek commented Sep 13, 2022

simonrozsival commented Sep 13, 2022

lambdageek commented Sep 13, 2022

radical commented Sep 14, 2022

radical commented Sep 14, 2022

simonrozsival commented Sep 14, 2022

azure-pipelines bot commented Sep 14, 2022

simonrozsival commented Sep 14, 2022

simonrozsival commented Sep 8, 2022 •

edited

Loading

simonrozsival commented Sep 9, 2022 •

edited

Loading

lambdageek commented Sep 13, 2022 •

edited

Loading