Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize unnecessarily allocation when initializing MetricProvider #1685

Merged
merged 11 commits into from
Jan 29, 2021

Conversation

victlu
Copy link
Contributor

@victlu victlu commented Jan 4, 2021

Changes

Optimize unnecessarily allocation when initializing MetricProvider.

  1. We allocate new object unnecessarily. We use the pattern collection.GetOrAdd(name, new Object()). The new Object() is not used if item already exists in the collection. Using a different form of GetOrAdd() we convert the new Object(0 into a Func(). Thus, only called if needed.

    // Old Pattern
    return this.longCounters.GetOrAdd(name, new Int64CounterMetricSdk(name));

    // New Pattern
    return this.longCounters.GetOrAdd(name, (k) => new Int64CounterMetricSdk(k));

  2. Optimize SortAndDedup() method to remove double allocation of List object. Using a SortedDictionary<> to help with sorting and deduping.

@victlu victlu requested a review from a team January 4, 2021 23:43
@victlu victlu changed the title Optimize get or add Optimize unnecessarily allocation when initializing MetricProvider Jan 4, 2021
@codecov
Copy link

codecov bot commented Jan 4, 2021

Codecov Report

Merging #1685 (0505585) into main (6fe93f8) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1685      +/-   ##
==========================================
+ Coverage   83.26%   83.28%   +0.02%     
==========================================
  Files         250      250              
  Lines        6870     6874       +4     
==========================================
+ Hits         5720     5725       +5     
+ Misses       1150     1149       -1     
Impacted Files Coverage Δ
src/OpenTelemetry/Metrics/CounterMetricSdkBase.cs 100.00% <100.00%> (ø)
...c/OpenTelemetry/Metrics/DoubleObserverMetricSdk.cs 100.00% <100.00%> (ø)
...rc/OpenTelemetry/Metrics/Int64ObserverMetricSdk.cs 100.00% <100.00%> (ø)
src/OpenTelemetry/Metrics/LabelSetSdk.cs 100.00% <100.00%> (ø)
src/OpenTelemetry/Metrics/MeasureMetricSdk.cs 87.50% <100.00%> (+1.78%) ⬆️
src/OpenTelemetry/Metrics/MeterSdk.cs 95.58% <100.00%> (+0.27%) ⬆️
...ZPages/Implementation/ZPagesExporterEventSource.cs 62.50% <0.00%> (+6.25%) ⬆️

@@ -59,7 +59,7 @@ internal BoundCounterMetric<T> Bind(LabelSet labelset, bool isShortLived)
lock (this.bindUnbindLock)
{
var recStatus = isShortLived ? RecordStatus.UpdatePending : RecordStatus.Bound;
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus));
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, (_) => this.CreateMetric(recStatus));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this still allocates a delegate for the lambda. The ones below aren't using a closure so we could just pass a static ref to avoid that. For the one here we could try calling TryGetValue first and then TryAdd if needed. Not sure which perf hit is worse, double-call or allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your comments. However, the initial concern was more on the logical side as it relates to calling this.CreateMetric() (and new *MetricSdk()) unnecessarily, rather than the allocation for the delegate and/or closure. This may become a concern as these "constructors" become more complex and requires additional resources to complete.

As to your concerns regarding lambdas/delegates, I can try doing some profiling of delegates vs methods. (i.e. TryGetValue/TryAdd/etc...) In my initial google search, it seems the debate is leaning toward use of delegates.

Copy link
Contributor Author

@victlu victlu Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some research and running some perf tests on my own, I've come to the following conclusions…

  1. Current PR does not 100% solve the issue of over-allocating a TValue when inserting into a ConcurrentDictionary. The PR will reduce the probability of this “leakage”.

  2. Use of Lambda expressions with GetOrAdd is more concise (IMHO) vs using TryGetValue/TryAdd/etc… However, there may be a small performance penalty if lambda expression is “declared”/allocated constantly (as in the case of closures, with older .NET).

  3. I don’t have enough context (of this project) to know what priorities and tradeoffs are valued, and thus, I look forward to feedback from owners and members here.

In more details… in the context of using ConcurrentDictionary<int,int> and my test program inserting and reading 40 Million entries…

  1. GetOrAdd(TKey,TValue) seems to be equivalent of CompareAndExchange operator. Given current code base, we need to pre-allocate a TValue before calling GetOrAdd and can result in over allocation when race condition inserts TKey before us. This is also true if we use the delegate/Func/Lambda versions of GetOrAdd, as per remarks in the official documentation.

  2. Performance (CPU) is equivalent when comparing Func<> vs TryGetValue/TryAdd/GetOrAdd. But, only if we pay special attention to make sure Func<> declarations are minimized (i.e. static Func<>, although, this really should be relegated to the Compiler/Optimizer to handle. Unfortunately, we need to explicitly help it).

  3. For Func<> requiring closures, static Func<> declaration is possible in newer .NET when using the new GetOrAdd(TKey,TArg,TValue) form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but related to ConcurrentDictionary, we've seen several cases where folks hit OOM due to the 2GB limit of ConcurrentDictionary on 64bit machines.

https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-5.0#remarks

There are other caveats - e.g. dotnet/runtime#41840 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victlu

I don’t have enough context (of this project) to know what priorities and tradeoffs are valued, and thus, I look forward to feedback from owners and members here.

My perspective on this is that anyone using OpenTelemetry (ipso facto) is sensitive to performance so the library should be as low-impact to the hosting process as possible.

Check out this benchmark:

    [MemoryDiagnoser]
    public class ConcurrentDictionaryBenchmarks
    {
        private ConcurrentDictionary<LabelSet, object> instruments;

        [IterationSetup]
        public void IterationSetup()
        {
            this.instruments = new ConcurrentDictionary<LabelSet, object>();
        }

        [Benchmark]
        public void GetOrAdd()
        {
            int closure = 0;

            this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (labelSet) => new object[closure]);

            this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (labelSet) => new object[closure]);
        }

        [Benchmark]
        public void TryAdd()
        {
            int closure = 0;

            if (!this.instruments.TryGetValue(LabelSet.BlankLabelSet, out object value))
            {
                this.instruments.TryAdd(LabelSet.BlankLabelSet, new object[closure]);
            }

            if (!this.instruments.TryGetValue(LabelSet.BlankLabelSet, out value))
            {
                this.instruments.TryAdd(LabelSet.BlankLabelSet, new object[closure]);
            }
        }
    }
Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
GetOrAdd 2.732 us 0.2642 us 0.7537 us 2.600 us - - - 224 B
TryAdd 2.408 us 0.2731 us 0.7658 us 2.100 us - - - 72 B

According to that, doing TryGetValue+TryAdd is faster and uses less memory. So for me, it's a no-brainer 😄

I do agree with you the lambda syntax is much nicer, the automatic allocation of a delegate in C# is a tragedy!

To @reyang's point, the code is in a lock...

            lock (this.bindUnbindLock)
            {
                var recStatus = isShortLived ? RecordStatus.UpdatePending : RecordStatus.Bound;
                boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus));
            }

...so why the heck are we even using a ConcurrentDictionary? 🤣 I haven't spent much time in the metrics code.

/cc @alanwest @eddynaka

Copy link
Contributor Author

@victlu victlu Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch Thanks for the tip on BenchmarkDotNet! That is an awesome library.

Your benchmark confirmed my findings as well. So, if you do this...

    [Benchmark]
    public void GetOrAdd()
    {
        int closure = 0;

        Func<string, int, object> func = (labelSet, closure) => new object[closure];

        this.instruments.GetOrAdd(LabelSet.BlankLabelSet, func, closure);

        this.instruments.GetOrAdd(LabelSet.BlankLabelSet, func, closure);
    }

You get this...

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
GetOrAdd 2.323 us 0.2878 us 0.8258 us 2.000 us - - - 72 B
TryAdd 2.577 us 0.3651 us 1.0476 us 2.000 us - - - 72 B

Unfortunately, the GetOrAdd(TKey,TArg,TValue) overload exists only on newer .NET. Without this form, we would still require allocation for the closure; in which case; TryAdd version is faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I did not necessarily have to pull out func<> separately. The Compiler is suppose to optimize that portion. So...

    [Benchmark]
    public void GetOrAdd()
    {
        int closure = 0;

        //Func<string, int, object> func = (labelSet, closure) => new object[closure];

        this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (l, c) => new object[c], closure);

        this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (l, c) => new object[c], closure);
    }

results in...

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
GetOrAdd 2.597 us 0.3755 us 1.101 us 2.100 us - - - 72 B
TryAdd 2.591 us 0.3495 us 1.003 us 2.000 us - - - 72 B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we need to stay focus on the original goal. Which is to not over-allocate the object we're inserting into the ConcurrentDictionary. Of which, unfortunately, we only partially addressed. I have strategies for this issue if we remain focus on this.

Also, I understand the whole Metric spec is WIP, and thus, maybe my effort here is premature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given concern for use of Func<> and support-ability in older .NET, I refactor to using TryGetValue/GetOrAdd instead. This should optimize (although it does not 100% solve) our over-allocation of dictionary objects.

boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus));
if (!this.counterBoundInstruments.TryGetValue(labelset, out boundInstrument))
{
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be:

	            if (!this.counterBoundInstruments.TryGetValue(labelset, out boundInstrument))
                {
                    boundInstrument = this.CreateMetric(recStatus);
                    this.counterBoundInstruments.TryAdd(labelset, boundInstrument);
                }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the TryAdd fail, we will return the wrong boundInstrument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOrAdd doesn't guarantee anything different, does it? From the doc:

Since a key/value can be inserted by another thread while valueFactory is generating a value, you cannot trust that just because valueFactory executed, its produced value will be inserted into the dictionary and returned. If you call GetOrAdd simultaneously on different threads, valueFactory may be called multiple times, but only one key/value pair will be added to the dictionary.

If we need a guarantee why not use a Dictionary in a lock?

DoubleObserverMetricHandleSdk boundInstrument;
if (!this.observerHandles.TryGetValue(labelset, out boundInstrument))
{
boundInstrument = this.observerHandles.GetOrAdd(labelset, new DoubleObserverMetricHandleSdk());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These other ones that don't need a closure you could use a static delegate:

        private static readonly Func<LabelSet, DoubleObserverMetricHandleSdk> CreateMetricFunc = (labelSet) => new DoubleObserverMetricHandleSdk();

        public override void Observe(double value, LabelSet labelset)
        {
            // TODO cleanup of handle/aggregator.   Issue #530
            var boundInstrument =
                this.observerHandles.GetOrAdd(labelset, CreateMetricFunc);

            boundInstrument.Observe(value);
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary as we already called TryGetValue prior. Thus, the expectation is that we will need to alloc the new item. Thus, converting to a static delegate is just adding 1 extra layer.

If your concern is about atomicity, then, both these are equivalent. In the case of a race condition, the GetOrAdd(Func<>) form does not guarantee atomicity either. Thus, the functionally is no different than current code. In both cases, we will wind up "leaking" the new object regardless.

IMHO, In this code base, I would prefer using the GetOrAdd(Func<>) syntax rather than break into TryGetValue/TryGetOrAdd pattern. Simply because the code is more readable and understandable. Plus, the compiler has the ability to optimize to "static" delegates as it sees fit. Also, going forward in newer .NET, we have the GetOrAdd syntax with passing in TArg, which addresses the performance with closure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary as we already called TryGetValue prior. Thus, the expectation is that we will need to alloc the new item. Thus, converting to a static delegate is just adding 1 extra layer.

Don't call TryGetValue for the case where a static delegate will work, just call TryGetValue passing the delegate ref (most places). Only call TryGetValue/TryAdd when you need the closure (the one place).

IMHO, In this code base, I would prefer using the GetOrAdd(Func<>) syntax rather than break into TryGetValue/TryGetOrAdd pattern.

Let's go with the allocation-free patterns.

@victlu
Copy link
Contributor Author

victlu commented Jan 21, 2021

Ping

Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eddynaka eddynaka changed the base branch from master to main January 27, 2021 22:14
@cijothomas cijothomas merged commit ccc191f into open-telemetry:main Jan 29, 2021
@victlu victlu deleted the Optimize-GetOrAdd branch January 29, 2021 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants