Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch<T> public api additions to unblock users #2542

Merged
merged 5 commits into from
Oct 29, 2021

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Oct 28, 2021

Issue

@Yun-Ting is blocked by the Batch<T> API. My understanding is the ask is to be able to filter log records before export. We don't want to do this in a custom processor because that will add time when the logs are being written. We would like to do this on the export thread so it doesn't block normal processing.

Changes

Batch<T> already has a ctor that accepts an array, I'm proposing we make it public and add public int Count { get; } to enable this type of logic:

public override ExportResult Export(in Batch<LogRecord> batch)
{
            LogRecord[] storage = new LogRecord[batch.Count];
            int storageCount = 0;
            foreach (var logRecord in batch)
            {
                 if (shouldInclude(logRecord))
                 {
                     storage[storageCount++] = logRecord;
                 }
            }

            return this.innerExporter.Export(new Batch(storage, storageCount));
}

TODOs

  • CHANGELOG.md updated for non-trivial changes
  • Changes in public API reviewed

@CodeBlanch CodeBlanch requested a review from a team October 28, 2021 20:22

this.Current = null;
return false;
return this.moveNextFunc(ref this);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this delegate method is faster, but not sure. Need to benchmark it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yun-Ting could you help on the benchmark part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I'm working on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks summary are as below.
With If statements: https://github.com/Yun-Ting/opentelemetry-dotnet/blob/Yun-Ting/oldMoveNext/test/OpenTelemetry.Tests/Trace/BatchTestMoveNextBenchmarks.cs

// * Summary *
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.402
[Host] : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT [AttachedDebugger]
DefaultJob : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
SingleElement 26.18 ns 0.514 ns 0.900 ns - -
MoveNextCircularBuffer 220.11 ns 4.389 ns 11.562 ns 0.0138 112 B
MoveNextArray 47.21 ns 0.984 ns 2.097 ns 0.0081 64 B

And with delegate methods: https://github.com/Yun-Ting/opentelemetry-dotnet/blob/Yun-Ting/newMoveNext/test/OpenTelemetry.Tests/Trace/BatchTestMoveNextBenchmarks.cs
// * Summary *

Method Mean Error StdDev Gen 0 Allocated
SingleElement 26.59 ns 0.527 ns 1.002 ns - -
MoveNextCircularBuffer 219.28 ns 4.341 ns 8.769 ns 0.0141 112 B
MoveNextArray 54.77 ns 1.114 ns 2.146 ns 0.0081 64 B

With delegate method, MoveNextArray is slightly slower than using if statements.
I think it is caused by:

this.Current = metrics[this.metricIndex];
this.metricIndex++;

vs

enumerator.Current = items[enumerator.itemIndex++];

But I'm quite surprised to learn that one line assignment + variable increment is slower than assign and increment in two lines.
For SingleElement and MoveNextCircularBuffer they are almost identical.
I'm voting for the delegate methods because of code readability.

@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #2542 (a33a152) into main (ce46d00) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2542      +/-   ##
==========================================
+ Coverage   80.65%   80.70%   +0.04%     
==========================================
  Files         254      254              
  Lines        8494     8516      +22     
==========================================
+ Hits         6851     6873      +22     
  Misses       1643     1643              
Impacted Files Coverage Δ
src/OpenTelemetry/Batch.cs 98.91% <100.00%> (+0.34%) ⬆️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 71.42% <0.00%> (-2.86%) ⬇️
...ter.ZPages/Implementation/ZPagesActivityTracker.cs 100.00% <0.00%> (+2.85%) ⬆️
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 79.41% <0.00%> (+5.88%) ⬆️

@reyang
Copy link
Member

reyang commented Oct 28, 2021

public override ExportResult Export(in Batch<LogRecord> batch)
{
            LogRecord[] storage = new LogRecord[batch.Count];
            int storageCount = 0;
            foreach (var logRecord in batch)
            {
                 if (shouldInclude(logRecord))
                 {
                     storage[storageCount++] = logRecord;
                 }
            }

            this.innerExporter.Export(new Batch(storage, storageCount));
}

Is it possible to make storage a Span<T>?

@CodeBlanch
Copy link
Member Author

@reyang

Is it possible to make storage a Span?

Good question! We could make it a ReadOnly/Memory<T> easily. Span<T> is a much bigger challenge.

  • We can't store a Span<T> in Batch<T> unless we convert struct Batch<T> into ref struct Batch<T> which would be a breaking change for anyone trying to use Batch<T> with an async method. You can't use ref structs with anything that will escape the stack.

  • We can't implement interfaces with a ref struct and we're currently utilizing IDisposable on Batch<T> to make sure it is cleaned up.

  • We can't use a ref struct as a type parameter so things like this break:

    public static Func<Batch<Metric>> GetMetricCollect(this BaseProvider baseProvider) { ... }

I think I would vote for leaving it an array or ReadOnlyMemory<T>. If users are worried about the allocation, they can do this:

public override ExportResult Export(in Batch<LogRecord> batch)
{
            LogRecord[] storage = ArrayPool<LogRecord>.Shared.Rent(batch.Count);
            try
            {
               int storageCount = 0;
               foreach (var logRecord in batch)
               {
                    if (shouldInclude(logRecord))
                    {
                        storage[storageCount++] = logRecord;
                    }
               }

              return this.innerExporter.Export(new Batch(storage, storageCount));
            }
            finally
            {
               ArrayPool<LogRecord>.Shared.Return(storage);
            }
}

@reyang
Copy link
Member

reyang commented Oct 28, 2021

  • We can't store a Span<T> in Batch<T> unless we convert struct Batch<T> into ref struct Batch<T> which would be a breaking change for anyone trying to use Batch<T> with an async method. You can't use ref structs with anything that will escape the stack.
  • We can't implement interfaces with a ref struct and we're currently utilizing IDisposable on Batch<T> to make sure it is cleaned up.

Yeah, I figured the same after I posted the question 😄

I think I would vote for leaving it an array or ReadOnlyMemory<T>. If users are worried about the allocation...

Makes sense 👍

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utpilla
Copy link
Contributor

utpilla commented Oct 28, 2021

I think I would vote for leaving it an array or ReadOnlyMemory<T>. If users are worried about the allocation, they can do this:

public override ExportResult Export(in Batch<LogRecord> batch)
{
            LogRecord[] storage = ArrayPool<LogRecord>.Shared.Rent(batch.Count);
            try
            {
               int storageCount = 0;
               foreach (var logRecord in batch)
               {
                    if (shouldInclude(logRecord))
                    {
                        storage[storageCount++] = logRecord;
                    }
               }

              return this.innerExporter.Export(new Batch(storage, storageCount));
            }
            finally
            {
               ArrayPool<LogRecord>.Shared.Return(storage);
            }
}

Is it safe to return the buffer back to the pool in the finally block? When should Batch dispose T[] items ?

@CodeBlanch
Copy link
Member Author

@utpilla

Is it safe to return the buffer back to the pool in the finally block? When should Batch dispose T[] items ?

Good question. Batch (at least today) doesn't really dispose anything. For array mode or single item mode, the dispose no-ops. For circular buffer mode, it makes sure everything was drained out of the batch to free up the slots in the circular buffer. Batch is this kind of ephemeral thing that doesn't live beyond the export call. So in the above, it is safe to do.

There is this IMemoryOwner we could use to transfer ownership to batch and let it handle the return. But that isn't available to net461/netstandard2.0 and it will force people to use pooling where they might not care so I'm not sure if the juice is worth the squeeze.

@CodeBlanch CodeBlanch merged commit d9b2ea4 into open-telemetry:main Oct 29, 2021
@CodeBlanch CodeBlanch deleted the batch-public-ctor branch October 29, 2021 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants