Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More EventListener overhead reductions #52092

Merged
merged 7 commits into from
May 7, 2021

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Apr 30, 2021

This PR builds on top of #51822, focusing entirely on DecodeObject speed.

I special-cased AllParametersAreString and AllParametersAreInt32 (as that seems to be by far most common).
Make use of Type.GetTypeCode in DecodeObject to speed up all the other cases.

Opening this PR to get some CI coverage as I added debug asserts around the user-supplied EventData.Size.

Benchmarks.cs

Method Toolchain Mean Error Ratio Allocated
WriteEventNoParams base 72.91 ns 0.036 ns 1.00 184 B
WriteEventNoParams PR51822 56.61 ns 0.028 ns 0.78 136 B
WriteEventNoParams new 55.98 ns 0.018 ns 0.77 136 B
3 int32 args
WriteEventIntParams base 182.35 ns 0.067 ns 1.00 280 B
WriteEventIntParams PR51822 128.17 ns 0.039 ns 0.70 280 B
WriteEventIntParams new 95.47 ns 0.098 ns 0.52 280 B
3 strings
WriteEventStringParams base 488.94 ns 0.243 ns 1.00 312 B
WriteEventStringParams PR51822 401.55 ns 0.210 ns 0.82 312 B
WriteEventStringParams new 114.08 ns 0.055 ns 0.23 312 B
3 enums, 1 being int64
WriteEventEnumParams base 571.13 ns 0.905 ns 1.00 280 B
WriteEventEnumParams PR51822 478.64 ns 0.309 ns 0.84 280 B
WriteEventEnumParams new 123.86 ns 0.018 ns 0.22 280 B
3 enums, all <= int32
WriteEventSmallEnumParams base 555.12 ns 0.223 ns 1.00 280 B
WriteEventSmallEnumParams PR51822 467.77 ns 0.324 ns 0.84 280 B
WriteEventSmallEnumParams new 87.88 ns 0.218 ns 0.16 280 B
A single HttpTelemetry RequestStart event
WriteHttpRequestStart base 834.12 ns 0.420 ns 1.00 448 B
WriteHttpRequestStart PR51822 666.90 ns 0.640 ns 0.80 448 B
WriteHttpRequestStart new 237.61 ns 0.213 ns 0.28 448 B
19 events - avg overhead
FullYarpRequest base 5,748.42 ns 2.780 ns 1.00 5048 B
FullYarpRequest PR51822 4,737.74 ns 2.637 ns 0.82 4760 B
FullYarpRequest new 2,007.78 ns 1.156 ns 0.35 4761 B

I'd assume this is about as far as we can get with the current API shape (args+payload+boxing+strings).

@ghost
Copy link

ghost commented Apr 30, 2021

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR builds on top of #51822, focusing entirely on DecodeObject speed.

I special-cased AllParametersAreString and AllParametersAreInt32 (as that seems to be by far most common).
Since we're in CoreLib, we can make use of GetCorElementType in DecodeObject to speed up all the other cases.

Opening this PR to get some CI coverage as I added debug asserts around the user-supplied EventData.Size.

Benchmarks.cs

Method Toolchain Mean Error Ratio Allocated
WriteEventNoParams base 77.36 ns 0.599 ns 1.00 184 B
WriteEventNoParams PR51822 63.90 ns 1.219 ns 0.83 136 B
WriteEventNoParams new 58.89 ns 0.434 ns 0.76 136 B
3 int32 args
WriteEventIntParams base 259.60 ns 4.737 ns 1.00 280 B
WriteEventIntParams PR51822 127.87 ns 1.163 ns 0.49 280 B
WriteEventIntParams new 102.47 ns 0.853 ns 0.39 280 B
3 strings
WriteEventStringParams base 520.92 ns 7.796 ns 1.00 312 B
WriteEventStringParams PR51822 449.73 ns 8.842 ns 0.86 312 B
WriteEventStringParams new 128.39 ns 2.499 ns 0.25 312 B
3 enums, 1 being int64
WriteEventEnumParams base 639.98 ns 5.292 ns 1.00 280 B
WriteEventEnumParams PR51822 490.84 ns 2.366 ns 0.77 280 B
WriteEventEnumParams new 195.63 ns 1.246 ns 0.31 280 B
3 enums, all <= int32
WriteEventSmallEnumParams base 589.55 ns 6.061 ns 1.00 280 B
WriteEventSmallEnumParams PR51822 484.31 ns 1.814 ns 0.82 280 B
WriteEventSmallEnumParams new 93.74 ns 1.568 ns 0.16 280 B
A single HttpTelemetry RequestStart event
WriteHttpRequestStart base 868.10 ns 4.009 ns 1.00 448 B
WriteHttpRequestStart PR51822 700.26 ns 6.704 ns 0.81 448 B
WriteHttpRequestStart new 287.17 ns 5.302 ns 0.33 448 B
19 events - avg overhead
FullYarpRequest base 6.198 us 0.076 us 1.00 4.93 KB
FullYarpRequest PR51822 4.900 us 0.056 us 0.79 4.65 KB
FullYarpRequest new 2.163 us 0.012 us 0.35 4.65 KB

I'd assume this is about as far as we can get with the current API shape (args+payload+boxing+strings).

Author: MihaZupan
Assignees: -
Labels:

area-System.Diagnostics.Tracing

Milestone: 6.0.0

TypeCode typeCode = Type.GetTypeCode(dataType);
int size = data->Size;

if (size == 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be a simple switch over typeCode instead of multiple levels of nested ifs faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't always match up here.
For enums, EventSource treats them as int32, but Type.GetTypeCode(dataType) will resolve the underlying type.
That is why the [SByte, Int32] range is being checked under size == 4 (and read as an int*), but then also checked outside the size == 4 case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you'd like to optimize for non-int enums it still seems to me that the switch would be better option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you would have duplication like

case TypeCode.Byte:
        if (dataType.IsEnum)
        {
            Debug.Assert(size == 4);
            return *(int*)dataPointer;
        }
        else
        {
            Debug.Assert(size == 1);
            return *(byte*)dataPointer;
        }

case TypeCode.SByte:
        if (dataType.IsEnum)
        {
            Debug.Assert(size == 4);
            return *(int*)dataPointer;
        }
        else
        {
            Debug.Assert(size == 1);
            return *(sbyte*)dataPointer;
        }
// ...

For byte, sbyte, short, ushort, int.

I can measure, but I doubt changing to a switch would be faster here.

@MihaZupan
Copy link
Member Author

MihaZupan commented May 3, 2021

Updated the benchmarks above with the latest changes

@tommcdon
Copy link
Member

tommcdon commented May 3, 2021

@sywhang

// null in the string.
#if DEBUG
Debug.Assert(data->Size % 2 == 0, "String size should be even");
for (int i = 0; i < data->Size / 2 - 1; i++) Debug.Assert(*((char*)dataPointer + i) != 0, "String may not contain null chars");
Copy link
Contributor

@sywhang sywhang May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @josalem: this change can be related to #52025 for EventPipe/ETW

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. There's currently nothing stopping users from putting a string with embedded nulls into EventSource.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though it would be good if at least one of @sywhang, @davmason or @josalem can confirm as well. Thanks for the great perf wins!

else
{
for (int i = 0; i < args.Length; i++, data++)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a little more perf to gain if the for loop is inside the method rather than outside of it. I'm guessing DecodeObject() isn't being inlined right now and making the loop body opaque inside the method doesn't give the JIT as much visbility to optimize. There is probably also a tiny overhead in repeated execution of the DecodeObject prolog/epilog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed DecodeObject to DecodeObjects. Control flow is a bit messier (yay for goto), but it does give us a few extra %.

Roughly ~10% improvement for WriteHttpRequestStart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, ~10% is even more than I would have guessed : )

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MihaZupan!

@MihaZupan
Copy link
Member Author

Method Toolchain Mean Error Ratio
WriteEventEnumParams DecodeObject 124.07 ns 0.063 ns 1.00
WriteEventEnumParams DecodeObjects 119.49 ns 0.409 ns 0.96
WriteHttpRequestStart DecodeObject 236.24 ns 1.064 ns 1.00
WriteHttpRequestStart DecodeObjects 207.29 ns 0.646 ns 0.88
FullYarpRequest DecodeObject 2,033.16 ns 8.476 ns 1.00
FullYarpRequest DecodeObjects 1,995.76 ns 5.244 ns 0.98

@MihaZupan
Copy link
Member Author

Failure unrelated: #51819

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants