-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[release/2.2][Port] Fix EventPipe EventHandle Caching for TraceLogging (#18355) #23661
Conversation
This port fixes the ETW scenario, but not the EventPipe (I'll look more into it, but it might not become part of this PR). |
Can you clarify what you mean by that it doesn't fix EventPipe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It would be good to do a quick validation on this fix against 2.2 to make sure that the leak is gone.
@brianrob With these changes, the process memory does not keep increasing due to redundant allocation of the same event object. It fixes the ETW scenario, but not the EventPipe scenario. When I enable EventPipe (by dropping the .eventpipeconfig), you can see that allocations do not stop. |
Oh, interesting. I'm surprised that this doesn't fix EvenPipe as well. That's worth profiling to understand. |
@brianrob Sorry, I miss spoke before. I just look at EventPipe, and it seems to behave as expected. Its memory usage seems bound to the |
Ok, that's good news. That's just because events are being stored in-memory. |
Tested private fix on my local machine. Looks good. |
Approved for 2.2.5 Port this to 3.0 too. |
@danmosemsft This bug was introduced in 2.2. I could not repro it in 2.1. |
@vivmishra These changes are already in |
@vivmishra When will the branch open? When will users receive the fix? |
We open after 4/9 and 2.2.5 is planned for May. |
Our services are currently impacted by this as all our logging goes over ETW (we are running inside WebApp). Since this is planned for May, with global rollout to WebApp in 4-6 weeks, I think we are looking at June. Are there any workarounds available until then? At this point, the only option seems to be to roll back to 2.1... Thanks! |
I might be dumb but:
As for me it all does not make sence. We've got a problem. There is the fix. And we cannot get it. |
When is the next 2.2 release slated? This is impacting a ton of serivces. We really should prioritize getting this out. |
The May release is currently scheduled for 5/14. We have a monthly release cadence since our full stack has a lot of moving pieces, and it takes time to test & stabilize all of our bits before doing a full servicing release - shipping individual parts of the stack piecemeal can be very costly. |
Because we need to stabilize and prep. How else would you suggest to do that?
Most customers (esp. enterprise) appreciate stability and low risk of breaking changes in servicing releases. How would you achieve that if we would merge "random" things without thinking if it is important, how important, what are the risks, etc.?
Because 2.2.4 release was already locked down by that time. It would destabilize it. Create unnecessary risk. Given that it was security release, it would create lots of problems (we often need to synchronize multiple products to ship similar security bug fixes in ...)
Answers above should answer that.
Believe me, this is far away from too formal in a serious platform like .NET. It is actually pretty agile and flexible.
It would help to clarify what "ton of services" means. How many? What are they? Is there internal email thread about it with details? Are all necessary people on it? That said, we are on monthly servicing release cadence. If you think that is not acceptable and you need faster cadence, maybe we should discuss that directly in general, not as part of specific PR. |
How do customers get this fix? In particular, how/when will it be available in Azure App Services? |
I found this page, I believe from a tweet from @davidfowl or @DamianEdwards (I might be wrong though): https://aspnetcoreon.azurewebsites.net/ In addition to that, the more official way seems to be to monitor this: https://github.com/Azure/app-service-announcements/issues - they should create a new issue when they begin rolling out 2.2.5. I have found that neither of the options are properly advertised anywhere (stumbled onto them via word-of- |
I'm not sure when the fix will be available in Azure App Services, but the latest Microsoft.NETCore.App from release/2.2 should contain it: https://www.nuget.org/packages/Microsoft.NETCore.App/2.2.5 |
Found the info on the .NET Core blog
|
Could someone confirm the ETW memory leak issue is fixed when they run their service compiled with 2.2.5? I am still seeing memory usage increase throughout the day after deploying build with 2.2.5. This was not a problem in 2.1. Would love to know other's experience before digging in deeper. |
I verified private binaries from 2.2.5 and the leak is fixed. |
Status from the Antares team is that the May rapid update cycle is in progress and scheduled to be complete by 5/31. |
Description
Enabling ETW or EventPipe tracing results in a unbound memory usage by the runtime.
Customer Impact
It impacts anyone using Azure customers using ASP.NET Core on App Service with Application Insights Profiler when ETW/EventPipe listeners are enabled.
Regression?
Yes, this is a regression. ETW scenario now shares some of the code of EventPipe, and this shared code introduced the bug where events were not properly cached. This results in a redundant allocation for the same event for as long as a session was recorded.
Risk
The risk is low. We checked this in a year ago and there have not been any issues reported with it. In addition, the code is only active when tracing is enabled.
I have manually tested the ETW and EventPipe scenarios reported and I can confirm that the bug is fixed (it is not on master branch neither), and I have stepped through the debugger to verify we do not keep allocating memory for the same event.
Issue
Fixes https://github.com/dotnet/coreclr/issues/23562
Originally at microsoft/ApplicationInsights-dotnet#1102, and dotnet/aspnetcore#8648