Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self #1892

Open
josalem opened this issue Jan 17, 2020 · 4 comments
Labels
area-Diagnostics-coreclr enhancement Product code improvement that does NOT require public API changes/additions EventPipe
Milestone

Comments

@josalem
Copy link
Contributor

josalem commented Jan 17, 2020

As detailed in #1794, when an application is tracing itself there is the potential for a deadlock to occur if the reader tries to take the EventPipe configuration lock while EventPipe::Disable (specifically EventPipe::RunWithCallbackPostponed) is in progress.

This occurs when EventPipeEventSource is reading because the Trace Event library uses Regex in its logic and Regex now has a lazily initialized ConcurrentDictionary cache that generates events when used. This causes the static EventSource provider for concurrent collections to get created (unless its been instantiated earlier in the process) which will call into native code to create the provider. This requires the config lock in EventPipe. If this occurs during EventPipe::RunWithCallbackPostponed, you may deadlock because the reader is waiting on the lock, but the writer is holding it and is blocked because the pipe's buffer is full.

See the stacks in #1794 for details.

A couple potential paths to fixing this:

  • break up the logic of EventPipe::Disable so that it isn't holding the config lock for the entire duration. Rundown and other parts of disable can run for minutes in big applications, so this shouldn't be holding a global lock like that the entire time.
  • find ways to defer the registration of new providers that get registered while EventPipe::Disable is in flight.

This issue breaks our EventPipe tests, but #1794 added a bypass that needs to be removed once this issue is closed.

CC - @tommcdon @noahfalk @sywhang

@josalem josalem added this to the 5.0 milestone Jan 17, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 17, 2020
@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Jan 27, 2020
@mikem8361 mikem8361 added the p1 label Mar 7, 2020
@tommcdon
Copy link
Member

@davmason @josalem is there any outstanding 5.0 work for this issue?

@josalem
Copy link
Contributor Author

josalem commented Jul 27, 2020

The bandaid fix applied in #1794 has kept this from popping up on my radar, so I believe this underlying issue is still present. We could still go and make the changes I alluded to in my comment on the PR in EventPipeEventSource to fix the only provided, live reader that I am aware of (note that this would be out of band work in Microsoft/PerfView). As for fixing the underlying issue, I think we need to find a way to do rundown without holding the EventPipeConfig lock, or at least adding a deferral mechanism that does provider creation/removal after a rundown has concluded.

@tommcdon
Copy link
Member

@josalem Thanks! To clarify, is this a test issue only and not a product bug?

@davmason
Copy link
Member

This is a product issue that we added a workaround for in our tests

@tommcdon tommcdon modified the milestones: 5.0.0, 6.0.0 Jul 30, 2020
@tommcdon tommcdon removed the p1 label Oct 13, 2020
@tommcdon tommcdon added the enhancement Product code improvement that does NOT require public API changes/additions label Mar 3, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 28, 2021
@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 16, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 1, 2021
@tommcdon tommcdon modified the milestones: 7.0.0, Future Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Diagnostics-coreclr enhancement Product code improvement that does NOT require public API changes/additions EventPipe
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants