Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[testing] Update TraceEvent package version and bypass corner case in EventPipe #1794

Merged
merged 3 commits into from
Jan 18, 2020

Conversation

josalem
Copy link
Contributor

@josalem josalem commented Jan 16, 2020

The fix (microsoft/perfview#1047) that resolves dotnet/coreclr#26241 is in version 2.0.48 of TraceEvent, so the failure can still occur randomly. This change updates the version used in testing to include the fix.

resolves dotnet/coreclr#26241

CC @tommcdon

Copy link
Member

@safern safern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for fixing.

@safern
Copy link
Member

safern commented Jan 16, 2020

@safern
Copy link
Member

safern commented Jan 16, 2020

@josalem this is still happening and a lot of PRs are hitting it more frequently, are you looking into why?

@josalem
Copy link
Contributor Author

josalem commented Jan 16, 2020

Yes, I'm looking into it. Any idea when the frequency spiked? I'm trying to see if anything might have changed that is causing this to resurface. It may be a different issue since it isn't hitting the infrastructure in place to light up when this particular bug was happening.

@safern
Copy link
Member

safern commented Jan 16, 2020

The frequency spiked after I merged my PR to use a single pipeline yesterday. The only difference I can see is that now those tests are using a Debug build of the libraries IL and before it was using a Release build.

@josalem
Copy link
Contributor Author

josalem commented Jan 16, 2020

I think I've diagnosed the issue. The issue being hit now is new and not the same as the issue this PR will fix.

The new issue is happening because of changes in Regex Caching that now use ConcurrentDictionaries (#542). EventPipeEventSource uses Regex in its pipe reading logic. At some point the Regex cache is updated which creates a ConcurrentDictionary and the System.Collections.Concurrent.ConcurrentCollectionsEventSource provider gets created. This routes through to EventPipe in native code to create the provider. This requires the EventPipeThread lock to be taken. During session shutdown, this same lock is taken and buffers get flushed to the pipe. The reader thread stops reading while waiting to enter the critical section, and the writer thread continues filling the pipe's buffer until that's full and it blocks writing. Now both the writer and reader are blocked and we deadlock.

@safern
Copy link
Member

safern commented Jan 16, 2020

So this seems like a regression on: #542, right? or should EventPipe react to those changes?

cc: @stephentoub who did those changes.

@stephentoub
Copy link
Member

Can you point me to where it's being used?

@josalem
Copy link
Contributor Author

josalem commented Jan 17, 2020

I don't believe this is an issue with #542, but rather an issue in EventPipe that's appears as an unfortunate byproduct of those changes. A fix would happen in the EventPipe code.

@stephentoub
Copy link
Member

Maybe. Looking at it again, I think the dictionary in Regex should be lazily created when it's first needed; that way anyone using the instance rather than static Regex methods won't need to create it. And if I change that, and EventPipe is using the instance methods, this shouldn't happen. (Of course it'd be even better if EventPipe were reliable in the face of such a usage.)

@josalem
Copy link
Contributor Author

josalem commented Jan 17, 2020

@stephentoub I don't think it's the fault of your changes, but here are the deadlocked callstacks:

Reader (managed):

        Child SP               IP Call Site
000000821F87C698 00007fffed3c9284 [InlinedCallFrame: 000000821f87c698] 
000000821F87C698 00007fffbecf3328 [InlinedCallFrame: 000000821f87c698] 
000000821F87C660 00007fffbecf3328 System.Diagnostics.Tracing.EventPipeInternal.CreateProvider(System.String, EtwEnableCallback)
000000821F87C750 00007fffbef82d2e System.Diagnostics.Tracing.EventPipeEventProvider.System.Diagnostics.Tracing.IEventProvider.EventRegister(System.Diagnostics.Tracing.EventSource, EtwEnableCallback, Void*, Int64 ByRef)
000000821F87C790 00007fffbef9349d System.Diagnostics.Tracing.EventProvider.EventRegister(System.Diagnostics.Tracing.EventSource, EtwEnableCallback)
000000821F87C7E0 00007fffbef90def System.Diagnostics.Tracing.EventProvider.Register(System.Diagnostics.Tracing.EventSource)
000000821F87C820 00007fffbef95f0e System.Diagnostics.Tracing.EventSource.Initialize(System.Guid, System.String, System.String[])
000000821F87C8D0 00007fffbef941a9 System.Diagnostics.Tracing.EventSource..ctor(System.Diagnostics.Tracing.EventSourceSettings, System.String[])
000000821F87C960 00007fffbef94042 System.Diagnostics.Tracing.EventSource..ctor()
000000821F87C990 00007fff62f61fb5 System.Collections.Concurrent.CDSCollectionETWBCLProvider..ctor()
000000821F87C9C0 00007fff62f61f5a System.Collections.Concurrent.CDSCollectionETWBCLProvider..cctor()
000000821F87DCF8 00007fffc2162443 [HelperMethodFrame: 000000821f87dcf8] 
000000821F87DE20 00007fff62f61e3b System.Collections.Concurrent.ConcurrentDictionary`2[[System.Text.RegularExpressions.RegexCache+Key, System.Text.RegularExpressions],[System.__Canon, System.Private.CoreLib]].AcquireAllLocks(Int32 ByRef)
000000821F87DE60 00007fff62f61d84 System.Collections.Concurrent.ConcurrentDictionary`2[[System.Text.RegularExpressions.RegexCache+Key, System.Text.RegularExpressions],[System.__Canon, System.Private.CoreLib]].get_Count()
000000821F87DEB0 00007fff62f6179b System.Text.RegularExpressions.RegexCache.Add(Key, System.Text.RegularExpressions.Regex)
000000821F87E070 00007fff62f493d9 System.Text.RegularExpressions.RegexCache.GetOrAdd(System.String)
000000821F87E150 00007fff62f49277 System.Text.RegularExpressions.Regex.Replace(System.String, System.String, System.String)
000000821F87E190 00007fff62f48741 Microsoft.Diagnostics.Tracing.Parsers.RegisteredTraceEventParser+TdhEventParser.ParseFields(Int32, Int32)
000000821F87E2F0 00007fff62f47fbd Microsoft.Diagnostics.Tracing.Parsers.RegisteredTraceEventParser+TdhEventParser.ParseEventMetaData()
000000821F87E380 00007fff62f44d68 Microsoft.Diagnostics.Tracing.Parsers.RegisteredTraceEventParser.TryLookupWorker(Microsoft.Diagnostics.Tracing.TraceEvent, System.Collections.Generic.Dictionary`2>)
000000821F87E450 00007fff62f449eb Microsoft.Diagnostics.Tracing.Parsers.ExternalTraceEventParser.b__0_0(Microsoft.Diagnostics.Tracing.TraceEvent)
000000821F87E4A0 00007fff62f444bf Microsoft.Diagnostics.Tracing.TraceEventDispatcher.Lookup(EVENT_RECORD*)
000000821F87E4F0 00007fff62f4431c Microsoft.Diagnostics.Tracing.EventPipeEventSource.DispatchEventRecord(EVENT_RECORD*)
000000821F87E530 00007fff62f43d59 Microsoft.Diagnostics.Tracing.EventPipeEventSource.EventCache_OnEvent(Microsoft.Diagnostics.Tracing.EventPipeEventHeader ByRef)
000000821F87E580 00007fff62f4126f Microsoft.Diagnostics.Tracing.EventPipe.EventCache.ProcessEventBlock(Byte[])
000000821F87E630 00007fff62f376fb Microsoft.Diagnostics.Tracing.EventPipeBlock.FromStream(FastSerialization.Deserializer)
000000821F87E680 00007fff62f33545 FastSerialization.Deserializer.ReadObjectDefinition(FastSerialization.Tags, FastSerialization.StreamLabel)
000000821F87E6E0 00007fff62f33c7b FastSerialization.Deserializer.ReadObject()
000000821F87E730 00007fff62f37471 Microsoft.Diagnostics.Tracing.EventPipeEventSource.Process()
000000821F87E770 00007fff62f2459c Tracing.Tests.Common.IpcTraceTest+c__DisplayClass8_0.b__1() [F:\workspace\_work\1\s\src\coreclr\tests\src\tracing\eventpipe\common\IpcTraceTest.cs @ 258]
000000821F87E820 00007fffbee25cfd System.Threading.Tasks.Task.InnerInvoke()
000000821F87E860 00007fffbee2c767 System.Threading.Tasks.Task+c.b__274_0(System.Object)
000000821F87E8A0 00007fffbee0af70 System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000821F87E910 00007fffbee25b0f System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
000000821F87E9B0 00007fffbee258f3 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
000000821F87E9F0 00007fffbee2589a System.Threading.Tasks.Task.ExecuteFromThreadPool(System.Threading.Thread)
000000821F87EA20 00007fffbee16d94 System.Threading.ThreadPoolWorkQueue.Dispatch()
000000821F87EAE0 00007fffbee01f4a System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Read (Native):

 # Child-SP          RetAddr           Call Site
00 00000082`1f87c158 00007fff`ed3857ff ntdll!ZwWaitForAlertByThreadId+0x14 [d:\rs1.obj.amd64fre\minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 3731] 
01 00000082`1f87c160 00007fff`ed385706 ntdll!RtlpWaitOnAddressWithTimeout+0x43 [d:\rs1\minkernel\ntos\rtl\waitaddr.c @ 817] 
02 00000082`1f87c190 00007fff`ed385597 ntdll!RtlpWaitOnAddress+0xb2 [d:\rs1\minkernel\ntos\rtl\waitaddr.c @ 1060] 
03 00000082`1f87c200 00007fff`ed344bf4 ntdll!RtlpWaitOnCriticalSection+0xdb [d:\rs1\minkernel\ntdll\resource.c @ 1471] 
04 00000082`1f87c2d0 00007fff`ed344b20 ntdll!RtlpEnterCriticalSectionContended+0xcc [d:\rs1\minkernel\ntdll\resource.c @ 2147] 
05 00000082`1f87c300 00007fff`c1b0ac96 ntdll!RtlEnterCriticalSection+0x40 [d:\rs1\minkernel\ntdll\resource.c @ 1766] 
06 00000082`1f87c330 00007fff`c1dcc48f coreclr!CrstBase::Enter+0x272 [F:\workspace\_work\1\s\src\coreclr\src\vm\crst.cpp @ 316] 
07 (Inline Function) --------`-------- coreclr!CrstBase::CrstHolder::{ctor}+0x14 [F:\workspace\_work\1\s\src\coreclr\src\vm\crst.h @ 388] 
08 (Inline Function) --------`-------- coreclr!EventPipe::RunWithCallbackPostponed+0x2d [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.h @ 128] 
09 00000082`1f87c3a0 00007fff`c21f6e32 coreclr!EventPipe::CreateProvider+0xd7 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.cpp @ 443] 
0a 00000082`1f87c540 00007fff`becf3332 coreclr!EventPipeInternal::CreateProvider+0x152 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipeinternal.cpp @ 100] 
0b 00000082`1f87c660 00007fff`bef82d2e System.Diagnostics.Tracing.EventPipeInternal.CreateProvider(System.String, EtwEnableCallback)+0x82
0c 00000082`1f87c750 00007fff`bef9349d System_Private_CoreLib!System.Diagnostics.Tracing.EventPipeEventProvider.System.Diagnostics.Tracing.IEventProvider.EventRegister(System.Diagnostics.Tracing.EventSource, EtwEnableCallback, Void*, Int64 ByRef)+0x1e
0d 00000082`1f87c790 00007fff`bef90def System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.EventRegister(System.Diagnostics.Tracing.EventSource, EtwEnableCallback)+0x5d
0e 00000082`1f87c7e0 00007fff`bef95f0e System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.Register(System.Diagnostics.Tracing.EventSource)+0x3f
0f 00000082`1f87c820 00007fff`bef941a9 System_Private_CoreLib!System.Diagnostics.Tracing.EventSource.Initialize(System.Guid, System.String, System.String[])+0x26e
10 00000082`1f87c8d0 00007fff`bef94042 System_Private_CoreLib!System.Diagnostics.Tracing.EventSource..ctor(System.Diagnostics.Tracing.EventSourceSettings, System.String[])+0x109
11 00000082`1f87c960 00007fff`62f61fb5 System_Private_CoreLib!System.Diagnostics.Tracing.EventSource..ctor()+0x12
12 00000082`1f87c990 00000272`00700100 system_collections_concurrent!System.Collections.Concurrent.CDSCollectionETWBCLProvider..ctor()+0x25*** WARNING: Unable to verify checksum for system.collections.concurrent.dll

...

Writer:

 # Child-SP          RetAddr           Call Site
00 00000082`1f0fe6f8 00007fff`e9844116 ntdll!ZwWriteFile+0x14 [d:\rs1.obj.amd64fre\minkernel\ntdll\daytona\objfre\amd64\usrstubs.asm @ 243] 
01 00000082`1f0fe700 00007fff`c249311a KERNELBASE!WriteFile+0x76 [d:\rs1\minkernel\kernelbase\filehops.c @ 1558] 
02 00000082`1f0fe770 00007fff`c22b2c55 coreclr!IpcStream::Write+0x5e [F:\workspace\_work\1\s\src\coreclr\src\debug\debug-pal\win\diagnosticsipc.cpp @ 144] 
03 00000082`1f0fe7b0 00007fff`c22b30b2 coreclr!IpcStreamWriter::Write+0x355 [F:\workspace\_work\1\s\src\coreclr\src\vm\fastserializer.cpp @ 61] 
04 00000082`1f0fe8e0 00007fff`c22d75a7 coreclr!FastSerializer::WriteBuffer+0x3da [F:\workspace\_work\1\s\src\coreclr\src\vm\fastserializer.cpp @ 187] 
05 00000082`1f0fea60 00007fff`c22b3348 coreclr!EventPipeBlock::FastSerialize+0x2b7 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipeblock.h @ 77] 
06 00000082`1f0feb80 00007fff`c22c4c57 coreclr!FastSerializer::WriteObject+0xe4 [F:\workspace\_work\1\s\src\coreclr\src\vm\fastserializer.cpp @ 164] 
07 00000082`1f0fec90 00007fff`c22cc629 coreclr!EventPipeFile::Flush+0x647 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipefile.cpp @ 304] 
08 00000082`1f0fede0 00007fff`c22cc36c coreclr!EventPipeFile::WriteEventToBlock+0x109 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipefile.cpp @ 363] 
09 00000082`1f0fef10 00007fff`c22be129 coreclr!EventPipeFile::WriteEvent+0x19c [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipefile.cpp @ 239] 
0a 00000082`1f0ff040 00007fff`c22bda54 coreclr!EventPipeBufferManager::WriteAllBuffersToFileV4+0x181 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipebuffermanager.cpp @ 652] 
0b 00000082`1f0ff1a0 00007fff`c215aa84 coreclr!EventPipeBufferManager::WriteAllBuffersToFile+0x358 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipebuffermanager.cpp @ 520] 
0c 00000082`1f0ff2d0 00007fff`c1dcd1ef coreclr!EventPipeSession::WriteAllBuffersToFile+0x118 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipesession.cpp @ 305] 
0d 00000082`1f0ff390 00007fff`c1dccbff coreclr!EventPipe::DisableInternal+0x55f [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.cpp @ 394] 
0e (Inline Function) --------`-------- coreclr!EventPipe::Disable::__l2::<lambda_bdfbdf0d3cafb5823bb854972735cdb5>::operator()+0x1b [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.cpp @ 315] 
0f (Inline Function) --------`-------- coreclr!EventPipe::RunWithCallbackPostponed+0x4c [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.h @ 129] 
10 00000082`1f0ff520 00007fff`c21519fa coreclr!EventPipe::Disable+0x147 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipe.cpp @ 313] 
11 00000082`1f0ff6e0 00007fff`c2151720 coreclr!EventPipeProtocolHelper::StopTracing+0x286 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipeprotocolhelper.cpp @ 202] 
12 00000082`1f0ff850 00007fff`c1dcb34c coreclr!EventPipeProtocolHelper::HandleIpcMessage+0x2d0 [F:\workspace\_work\1\s\src\coreclr\src\vm\eventpipeprotocolhelper.cpp @ 131] 
13 00000082`1f0ff960 00007fff`ed2784d4 coreclr!DiagnosticServer::DiagnosticsServerThread+0x44c [F:\workspace\_work\1\s\src\coreclr\src\vm\diagnosticserver.cpp @ 97] 
14 00000082`1f0ffad0 00007fff`ed38e851 kernel32!BaseThreadInitThunk+0x14 [d:\rs1\base\win32\client\thread.c @ 64] 
15 00000082`1f0ffb00 00000000`00000000 ntdll!RtlUserThreadStart+0x21 [d:\rs1\minkernel\ntdll\rtlstrt.c @ 997] 

@josalem
Copy link
Contributor Author

josalem commented Jan 17, 2020

After chatting with @sywhang there may be a quick band-aid change we can make that should clear up CI. We would need to change

https://github.com/microsoft/perfview/blob/2fb635bcde289b2995570001334c08934371003c/src/TraceEvent/RegisteredTraceEventParser.cs#L375

https://github.com/microsoft/perfview/blob/2fb635bcde289b2995570001334c08934371003c/src/TraceEvent/RegisteredTraceEventParser.cs#L429-L430

https://github.com/microsoft/perfview/blob/2fb635bcde289b2995570001334c08934371003c/src/TraceEvent/RegisteredTraceEventParser.cs#L753

to use the instance methods. That should bypass the issue for now, and let us get CI back to green.

An alternative that I'm going to try with this PR is to just put a ConcurrentDictionary in the test itself so we bypass the provider creation during session disable.

Neither fixes the underlying issue, but would hopefully get CI green again while I engineer a broader fix. We could also disable the tests while I engineer the fix.

* Remove this change once the fix has be made
@stephentoub
Copy link
Member

We would need to change to use the instance methods

This is what I was alluding to in #1794 (comment). However, right now I'm not sure that would actually help. We currently construct the ConcurrentDictionary as part of initializing a readonly static field, so it's going to be created at some point before first use, and it's possible using the instance methods could trigger it. To address that, we would need to lazily initialize the CD the first time code actually wants to add to the cache, which we could do, and maybe should.

@josalem josalem force-pushed the dev/josalem/update-traceevent-ver branch from 73b0e5b to 47daa07 Compare January 17, 2020 21:21
@josalem
Copy link
Contributor Author

josalem commented Jan 17, 2020

Creating the ConcurrentDictionary and using it in the test seems to bypass the issue. I'll create a GitHub issue to track the underlying problem, and we can merge this PR in to get CI green again.

@josalem josalem changed the title [testing] Update TraceEvent package version [testing] Update TraceEvent package version and bypass corner case in EventPipe Jan 17, 2020
@safern
Copy link
Member

safern commented Jan 18, 2020

Merging to unblock other PRs as this is green and approved.

@safern safern merged commit 522e9d9 into dotnet:master Jan 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants