Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled #75286

Merged

Conversation

@simonrozsival simonrozsival added arch-wasm WebAssembly architecture area-Build-mono labels Sep 8, 2022
@ghost
Copy link

ghost commented Sep 8, 2022

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR

Author: simonrozsival
Assignees: -
Labels:

arch-wasm, area-Build-mono

Milestone: -

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@ghost ghost assigned simonrozsival Sep 8, 2022
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival simonrozsival changed the title [wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading perf tracing enabled [wasm-mt] Disable and fix remaining failing tests on Browser with multi-threading and perf tracing enabled Sep 8, 2022
@simonrozsival
Copy link
Member Author

Tests that previously passed are now failing:

  • WasmTestOnBrowser-Microsoft.Extensions.DependencyInjection.Tests
  • WasmTestOnBrowser-System.Security.Cryptography.Tests
  • WasmTestOnBrowser-System.Threading.Tasks.Dataflow.Tests

This PR will need some additional work to fix all the remaining failing tests in the multi-threaded lane. I wonder why it started failing now and how it's connected to the disabled tests.

@simonrozsival
Copy link
Member Author

simonrozsival commented Sep 9, 2022

Sample stack trace of some failing System.Threading.Tasks.Dataflow.Tests:

fail: [FAIL] System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree
[19:08:25] fail: [FAIL] System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree
[19:08:25] info: System.Threading.Tasks.TaskSchedulerException : An exception was thrown by a TaskScheduler.
[19:08:25] info: ---- System.Threading.ThreadStartException : Thread failed to start.
[19:08:25] info: -------- System.ExecutionEngineException : mono_thread_platform_create_thread() failed
[19:08:25] info:    at System.Threading.Tasks.Task.ScheduleAndStart(Boolean needsProtection)
[19:08:25] info:    at System.Threading.Tasks.Task.Start(TaskScheduler scheduler)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.Common.StartTaskSafe(Task task, TaskScheduler scheduler)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessary_Slow(Boolean isReplacementReplica, Boolean outgoingLockKnownAcquired)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessary(Boolean isReplacementReplica, Boolean outgoingLockKnownAcquired)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].OfferAsyncIfNecessaryWithValueLock()
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Internal.SourceCore`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].AddMessage(String item)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.BufferBlock`1[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].System.Threading.Tasks.Dataflow.ITargetBlock.OfferMessage(DataflowMessageHeader messageHeader, String messageValue, ISourceBlock`1 source, Boolean consumeToAccept)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.DataflowBlock.Post[String](ITargetBlock`1 target, String item)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.DataflowTestHelpers.PostRange[String](ITargetBlock`1 target, Int32 lowerBoundInclusive, Int32 upperBoundExclusive, Func`2 selector)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.CreateFillLink[String](Int32 messages, ITargetBlock`1 target)
[19:08:25] info:    at System.Threading.Tasks.Dataflow.Tests.JoinBlockTests.TestTree()
[19:08:25] info: --- End of stack trace from previous location ---
[19:08:25] info: ----- Inner Stack Trace -----
[19:08:25] info:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] info:    at System.Threading.Thread.StartCore()
[19:08:25] info:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] info:    at System.Threading.Thread.UnsafeStart()
[19:08:25] info:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] info:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] info:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] info:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] info:    at System.Threading.ThreadPoolWorkQueue.Enqueue(Object callback, Boolean forceGlobal)
[19:08:25] info:    at System.Threading.ThreadPool.UnsafeQueueUserWorkItemInternal(Object callBack, Boolean preferLocal)
[19:08:25] info:    at System.Threading.Tasks.ThreadPoolTaskScheduler.QueueTask(Task task)
[19:08:25] info:    at System.Threading.Tasks.TaskScheduler.InternalQueueTask(Task task)
[19:08:25] info:    at System.Threading.Tasks.Task.ScheduleAndStart(Boolean needsProtection)
[19:08:25] info: ----- Inner Stack Trace -----
[19:08:25] info: 
[19:08:25] warn: 
[19:08:25] warn: Unhandled Exception:
[19:08:25] warn: System.Threading.ThreadStartException: Thread failed to start.
[19:08:25] warn:  ---> System.ExecutionEngineException: mono_thread_platform_create_thread() failed
[19:08:25] warn:    --- End of inner exception stack trace ---
[19:08:25] warn:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] warn:    at System.Threading.Thread.StartCore()
[19:08:25] warn:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] warn:    at System.Threading.Thread.UnsafeStart()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] warn:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] warn:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] warn:    at System.Threading.ThreadPoolWorkQueue.Dispatch()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[19:08:25] warn:    at System.Threading.Thread.StartCallback()
[19:08:25] warn: [ERROR] FATAL UNHANDLED EXCEPTION: System.Threading.ThreadStartException: Thread failed to start.
[19:08:25] warn:  ---> System.ExecutionEngineException: mono_thread_platform_create_thread() failed
[19:08:25] warn:    --- End of inner exception stack trace ---
[19:08:25] warn:    at System.Threading.Thread.ThrowThreadStartException(Exception ex)
[19:08:25] warn:    at System.Threading.Thread.StartCore()
[19:08:25] warn:    at System.Threading.Thread.Start(Boolean captureContext, Boolean internalThread)
[19:08:25] warn:    at System.Threading.Thread.UnsafeStart()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.CreateWorkerThread()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.MaybeAddWorkingWorker(PortableThreadPool threadPoolInstance)
[19:08:25] warn:    at System.Threading.PortableThreadPool.RequestWorker()
[19:08:25] warn:    at System.Threading.ThreadPool.RequestWorkerThread()
[19:08:25] warn:    at System.Threading.ThreadPoolWorkQueue.Dispatch()
[19:08:25] warn:    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
[19:08:25] warn:    at System.Threading.Thread.StartCallback()

This is the same failure I'm getting locally when I set "pthreadPoolSize": 0 in mono-config.json (that happens if the test project isn't published with -p:WasmEnableThreads=true). It also happens when the pthread pool is too small and we run out of workers in the pool. We use the default 4 pthread pool workers and from my local testing System.Threading.Tasks.Dataflow.Tests needs at least 8 thread pool workers. I'll try bumping up the pool size and see if that's enough to solve the issue for now.

Also the process timeouts without completing properly even with the JS synchronization context we have:

[19:23:18] fail: Tests timed out. Killing driver service pid 25854
[19:23:18] fail: Application has finished with exit code TIMED_OUT but 0 was expected
XHarness exit code: 71 (GENERAL_FAILURE)

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@simonrozsival simonrozsival marked this pull request as ready for review September 10, 2022 10:46
@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek
Copy link
Member

@kg is it reasonable to use 16 web workers for CI? Are we testing something that will never happen in real code?

@kg
Copy link
Member

kg commented Sep 12, 2022

16 is definitely too much. In practice I don't think you'll see more than 8 real threads available on regular user machines anytime soon, and you could end up being limited to way less based on core count. (I don't know how the browser decides on the limit).

It makes sense to run some tests with high and low counts to test those scenarios though.

@lambdageek
Copy link
Member

lambdageek commented Sep 13, 2022

@simonrozsival It would be good to understand what the maximum required degree of parallelism is for those dataflow tests - ie: do they really need to run at least 8 threads in parallel, or is the threadpool just spinning up as many threads as it can because there are a lot of async tasks.

If the dataflow tests really have a high degree of required parallelism, we should make an issue to make simplified tests for threaded wasm. If it's the threadpool, we should make an issue to make it understand that thread creation on wasm can fail sometimes and try to recover.

I don't think we should throw as many workers as we can at a test until it is passing, if we're past what a regular desktop browser would support.

@simonrozsival
Copy link
Member Author

@lambdageek OK, I'll look deeper into the codebase

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

@lambdageek The problem seems to be JoinBlockTests. I tried disabling parallelization for that class but that wasn't reliable and the tests still fail very often. It should be enough to bump the pthread pool size to just 8 instead of 16. Would that be acceptable? If not, I think we would have to disable JoinBlockTests (or at least part of it).

@lambdageek
Copy link
Member

@simonrozsival TestFree looks pretty busy, does the stability of the rest of the testsuite improve if you just disable that one?

Let's go with 8 and create an issue to look at these tests again.

@simonrozsival
Copy link
Member Author

@lambdageek when that particular test is disabled, it sometimes succeeds, and sometimes it gets stuck. I tried finding a subset of JoinBlockTests that we could disable to get the tests passing reliably, but I wasn't successful. It might not be the only test class that is causing the problem after all.

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

@lambdageek
Copy link
Member

@lambdageek when that particular test is disabled, it sometimes succeeds, and sometimes it gets stuck. I tried finding a subset of JoinBlockTests that we could disable to get the tests passing reliably, but I wasn't successful. It might not be the only test class that is causing the problem after all.

Ok, fair enough. Let's go with 8 workers for now.

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

I think the tests got rewritten using the new template which uses the wasm app host which I think knows when to add the additional headers on its own

@radical
Copy link
Member

radical commented Sep 14, 2022

BTW In the last test run I noticed that the --web-server-use-cop xharness flag disappeared from the Wasm.Browser.Threads.Sample and Wasm.Browser.EventPipe.Sample and I have no idea why it is missing. It seems as if the tests ran with the $(RunScriptCommand) from the basic Browser sample.

This got dropped in 4486805 as it moved to building a custom RunScriptCommand, but missed referencing WasmXHarnessArgs which is getting that argument.

@radical
Copy link
Member

radical commented Sep 14, 2022

When we feel confident about these, the tests can be enabled outside runtime-wasm. And we can stop ignoring the test failures.

@simonrozsival
Copy link
Member Author

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simonrozsival
Copy link
Member Author

The Build Browser wasm Linux Release LibraryTests_Threading and Build Browser wasm Linux Release LibraryTests_Threading_PerfTracing legs are now green.

@simonrozsival simonrozsival merged commit 9d8be44 into dotnet:main Sep 14, 2022
@simonrozsival simonrozsival deleted the wasm-mt-ep-fix-remaining-failing-tests branch September 14, 2022 11:47
@ghost ghost locked as resolved and limited conversation to collaborators Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Build-mono
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[wasm-ep] Wasm.Browser.EventPipe.Sample fails
4 participants