Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-free updates for Histogram #2951

Merged

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Feb 25, 2022

Changes

  • Implement lock-free updates for Histogram
  • This does increase the size of MetricPoint struct as we add a new int which would be used for synchronization

If this approach seems good, I can implement it for HistogramSumCount as well.

Stress Test Results

To better understand the perf improvement, we need to look at two scenarios:

  1. There is not much contention to update the same MetricPoint. This was tested by running the following in the Run method of stress test:
TestHistogram.Record(
           random.Next(MaxHistogramMeasurement),
           new("DimName1", DimensionValues[random.Next(0, ArraySize)]),
           new("DimName2", DimensionValues[random.Next(0, ArraySize)]),
           new("DimName3", DimensionValues[random.Next(0, ArraySize)]));
  1. There is high contention to update the same MetricPoint. This was tested by running the following in the Run method of stress test:
TestHistogram.Record(
           random.Next(MaxHistogramMeasurement),
           new("DimName1", "DimVal1"),
           new("DimName2", "DimVal2"),
           new("DimName3", "DimVal3"));

While there is a perf improvement in both the cases, there is a substantial improvement in the second case where there is high contention to update the same MetricPoint.

For the first scenario, Loops/ second go up from ~20M to ~23M (~15% increase)
For the second scenario, Loops/second go up from ~5.8M to ~7.6M (~31% increase)

Here are the numbers for the 1st and 2nd scenario respectively:

main branch

image

With this PR

image

@utpilla utpilla requested a review from a team February 25, 2022 22:29
@utpilla utpilla changed the title Lock-free updates for Histogram [Proposal] Lock-free updates for Histogram Feb 25, 2022
@codecov
Copy link

codecov bot commented Feb 25, 2022

Codecov Report

Merging #2951 (5e9a98d) into main (6981795) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2951      +/-   ##
==========================================
+ Coverage   83.98%   84.02%   +0.04%     
==========================================
  Files         254      254              
  Lines        8942     8946       +4     
==========================================
+ Hits         7510     7517       +7     
+ Misses       1432     1429       -3     
Impacted Files Coverage Δ
src/OpenTelemetry/Metrics/HistogramBuckets.cs 100.00% <ø> (ø)
...ementation/HttpHandlerMetricsDiagnosticListener.cs 94.11% <100.00%> (ø)
src/OpenTelemetry/Metrics/MetricPoint.cs 86.02% <100.00%> (+0.42%) ⬆️
src/OpenTelemetry/BatchExportProcessor.cs 87.36% <0.00%> (+3.15%) ⬆️

@reyang
Copy link
Member

reyang commented Feb 25, 2022

While there is a perf improvement in both the cases, there is a substantial improvement in the second case where there is high contention to update the same MetricPoint.

Great analysis! 👍

this.histogramBuckets.RunningBucketCounts[i]++;
if (Interlocked.Exchange(ref this.histogramBuckets.UsingHistogram, 1) == 0)
{
this.runningValue.AsLong++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't look into the underlying code, wonder if this ++ might throw (e.g. integer overflow case). If that's the case, we might need to make sure we release the spinlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't acquire any lock in the first place to be released. SpinWait is used to just smartly apply context switch for the thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one Ln335 we set this.histogramBuckets.UsingHistogram = 0, if we throw before this, all other threads would spin I guess?

Copy link
Member

@reyang reyang Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unchecked would solve the problem here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/CircularBuffer.cs#L139-L143 We could explore if we need to have an exit plan like done for CircularBuffer.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch
Copy link
Member

Approach LGTM 🚀

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add to changelog and we are good to go.

@cijothomas cijothomas changed the title [Proposal] Lock-free updates for Histogram Lock-free updates for Histogram Mar 1, 2022
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
@cijothomas cijothomas merged commit e7b0257 into open-telemetry:main Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants