Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[release/2.2][Port] Fix EventPipe EventHandle Caching for TraceLogging (#18355) #23661

Merged
merged 1 commit into from
Apr 10, 2019
Merged

[release/2.2][Port] Fix EventPipe EventHandle Caching for TraceLogging (#18355) #23661

merged 1 commit into from
Apr 10, 2019

Conversation

jorive
Copy link
Member

@jorive jorive commented Apr 2, 2019

Description

Enabling ETW or EventPipe tracing results in a unbound memory usage by the runtime.

Customer Impact

It impacts anyone using Azure customers using ASP.NET Core on App Service with Application Insights Profiler when ETW/EventPipe listeners are enabled.

Regression?

Yes, this is a regression. ETW scenario now shares some of the code of EventPipe, and this shared code introduced the bug where events were not properly cached. This results in a redundant allocation for the same event for as long as a session was recorded.

Risk

The risk is low. We checked this in a year ago and there have not been any issues reported with it. In addition, the code is only active when tracing is enabled.
I have manually tested the ETW and EventPipe scenarios reported and I can confirm that the bug is fixed (it is not on master branch neither), and I have stepped through the debugger to verify we do not keep allocating memory for the same event.

Issue

Fixes https://github.com/dotnet/coreclr/issues/23562
Originally at microsoft/ApplicationInsights-dotnet#1102, and dotnet/aspnetcore#8648

@jorive jorive added bug Product bug (most likely) * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) area-Tracing labels Apr 2, 2019
@jorive jorive requested review from brianrob, noahfalk and vancem April 2, 2019 18:48
@jorive
Copy link
Member Author

jorive commented Apr 2, 2019

This port fixes the ETW scenario, but not the EventPipe (I'll look more into it, but it might not become part of this PR).

@brianrob
Copy link
Member

brianrob commented Apr 2, 2019

Can you clarify what you mean by that it doesn't fix EventPipe?

Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would be good to do a quick validation on this fix against 2.2 to make sure that the leak is gone.

@jorive
Copy link
Member Author

jorive commented Apr 2, 2019

@brianrob With these changes, the process memory does not keep increasing due to redundant allocation of the same event object. It fixes the ETW scenario, but not the EventPipe scenario. When I enable EventPipe (by dropping the .eventpipeconfig), you can see that allocations do not stop.

@brianrob
Copy link
Member

brianrob commented Apr 2, 2019

Oh, interesting. I'm surprised that this doesn't fix EvenPipe as well. That's worth profiling to understand.

@jorive
Copy link
Member Author

jorive commented Apr 2, 2019

@brianrob Sorry, I miss spoke before. I just look at EventPipe, and it seems to behave as expected. Its memory usage seems bound to the CircularBufferMB. The grow I saw was bound by the default it 1GB... I tried different values, and the total allocation was bound by it.

@brianrob
Copy link
Member

brianrob commented Apr 2, 2019

Ok, that's good news. That's just because events are being stored in-memory.

@pharring
Copy link

pharring commented Apr 3, 2019

Tested private fix on my local machine. Looks good.

@jorive jorive added this to the 2.2.x milestone Apr 3, 2019
@jorive jorive added Servicing-consider Issue for next servicing release review and removed * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) labels Apr 3, 2019
@danmoseley
Copy link
Member

@jorive @brianrob

It impacts anyone using Azure customers using ASP.NET Core on App Service with Application Insights Profiler when ETW/EventPipe listeners are enabled.

Is that true for customers on 2.1? Should it be ported there (LTS) ?

@vivmishra vivmishra added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Apr 4, 2019
@vivmishra vivmishra modified the milestones: 2.2.x, 2.2.5 Apr 4, 2019
@vivmishra
Copy link

Approved for 2.2.5
Wait for branch to open.

Port this to 3.0 too.

@jorive
Copy link
Member Author

jorive commented Apr 4, 2019

@danmosemsft This bug was introduced in 2.2. I could not repro it in 2.1.

@jorive
Copy link
Member Author

jorive commented Apr 4, 2019

@vivmishra These changes are already in master and should be in release/3.0 too.

@jorive
Copy link
Member Author

jorive commented Apr 5, 2019

@vivmishra When will the branch open? When will users receive the fix?

@vivmishra
Copy link

We open after 4/9 and 2.2.5 is planned for May.

@cdavid
Copy link

cdavid commented Apr 9, 2019

Our services are currently impacted by this as all our logging goes over ETW (we are running inside WebApp). Since this is planned for May, with global rollout to WebApp in 4-6 weeks, I think we are looking at June.

Are there any workarounds available until then? At this point, the only option seems to be to roll back to 2.1...

Thanks!

@yahorsi
Copy link

yahorsi commented Apr 9, 2019

I might be dumb but:

  1. Why do we need to close brunches at all
  2. Why do we need to approve what is planned for what service release. If there is the fix and we want it in the service release - just merge?
  3. If we still need p 1 and p 2 what are we waiting for and why we're not releasing 2.2.4
  4. Why we need to wait month before we get the fix if it is already available/implemented.

As for me it all does not make sence. We've got a problem. There is the fix. And we cannot get it.
IMHO there is an issue in planning here, tooo formal and not agile at all that is causing seriouse delays when it comes to the service releases.

@wtgodbe wtgodbe merged commit b4e920d into dotnet:release/2.2 Apr 10, 2019
@jorive jorive deleted the dev/issue-23562 branch April 10, 2019 20:10
@mjsabby
Copy link

mjsabby commented May 3, 2019

When is the next 2.2 release slated? This is impacting a ton of serivces. We really should prioritize getting this out.

@wtgodbe
Copy link
Member

wtgodbe commented May 3, 2019

The May release is currently scheduled for 5/14. We have a monthly release cadence since our full stack has a lot of moving pieces, and it takes time to test & stabilize all of our bits before doing a full servicing release - shipping individual parts of the stack piecemeal can be very costly.

@karelz
Copy link
Member

karelz commented May 4, 2019

@yahorsi

Why do we need to close branches at all

Because we need to stabilize and prep. How else would you suggest to do that?

Why do we need to approve what is planned for what service release. If there is the fix and we want it in the service release - just merge?

Most customers (esp. enterprise) appreciate stability and low risk of breaking changes in servicing releases. How would you achieve that if we would merge "random" things without thinking if it is important, how important, what are the risks, etc.?

If we still need p 1 and p 2 what are we waiting for and why we're not releasing 2.2.4

Because 2.2.4 release was already locked down by that time. It would destabilize it. Create unnecessary risk. Given that it was security release, it would create lots of problems (we often need to synchronize multiple products to ship similar security bug fixes in ...)

Why we need to wait month before we get the fix if it is already available/implemented.

Answers above should answer that.

IMHO there is an issue in planning here, tooo formal and not agile at all that is causing seriouse delays when it comes to the service releases.

Believe me, this is far away from too formal in a serious platform like .NET. It is actually pretty agile and flexible.
If you think it is not, I encourage you to try to build a system that builds all the repos involved - tools, VS plugins, compiler, frameworks, etc. ... do the level of builds, testing, etc. on a scale we do in .NET Core (which includes also non-Microsoft partner company like Red Hat). Maybe you may find a project or two out there with this scale which are more agile, but most likely only a couple if any at all.
If someone, truly needs a fix ASAP, they can build their own binary / product on local box and go with it -- yes, it is costly to do that and involved. That is the price for agility - associated cost.
Even if we were able to produce build every week or every day, would it be smart to ship a new build with each fix? Is it really what majority of our customers want? Go ask a few enterprises and large scale services how often they are willing to assess and consume new builds ... I'd be curious if other projects (larger-scale projects like .NET Core) do faster cadence than monthly (or twice a month top).

@mjsabby When is the next 2.2 release slated? This is impacting a ton of serivces. We really should prioritize getting this out.

It would help to clarify what "ton of services" means. How many? What are they? Is there internal email thread about it with details? Are all necessary people on it?
It is hard to act on information out of context without clear evidence.

That said, we are on monthly servicing release cadence. If you think that is not acceptable and you need faster cadence, maybe we should discuss that directly in general, not as part of specific PR.

@pharring
Copy link

How do customers get this fix? In particular, how/when will it be available in Azure App Services?

@cdavid
Copy link

cdavid commented May 20, 2019

I found this page, I believe from a tweet from @davidfowl or @DamianEdwards (I might be wrong though): https://aspnetcoreon.azurewebsites.net/

In addition to that, the more official way seems to be to monitor this: https://github.com/Azure/app-service-announcements/issues - they should create a new issue when they begin rolling out 2.2.5.

I have found that neither of the options are properly advertised anywhere (stumbled onto them via word-of-mouth tweet) and could not find any references here: https://docs.microsoft.com/en-us/aspnet/core/host-and-deploy/azure-apps/?view=aspnetcore-2.2

@wtgodbe
Copy link
Member

wtgodbe commented May 20, 2019

I'm not sure when the fix will be available in Azure App Services, but the latest Microsoft.NETCore.App from release/2.2 should contain it: https://www.nuget.org/packages/Microsoft.NETCore.App/2.2.5

@pharring
Copy link

Found the info on the .NET Core blog

Deployment of these updates Azure App Services has been scheduled and they estimate the deployment will be complete by May 26, 2019.

@atifield
Copy link

Could someone confirm the ETW memory leak issue is fixed when they run their service compiled with 2.2.5? I am still seeing memory usage increase throughout the day after deploying build with 2.2.5. This was not a problem in 2.1. Would love to know other's experience before digging in deeper.

@pharring
Copy link

I verified private binaries from 2.2.5 and the leak is fixed.
However, I also just checked my App Service and 2.2.5 is not present (it's now May 29th 2019)
@leecow any update on when 2.2.5 will get to Azure App Service?

@leecow
Copy link
Member

leecow commented May 29, 2019

Status from the Antares team is that the May rapid update cycle is in progress and scheduled to be complete by 5/31.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Tracing bug Product bug (most likely) Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.