Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AvaloniaObject.GetValue performance #15342

Merged
merged 4 commits into from
Apr 23, 2024

Conversation

MrJul
Copy link
Member

@MrJul MrJul commented Apr 12, 2024

What does the pull request do?

This PR improves the performance of AvaloniaProperty.GetValue(), which is used for every Avalonia property access.
This method is such a hot path that even tiny improvements can have a measurable effect.

Numbers

Let's start with the Remeasure benchmark, which does a re-layout pass for ≈6500 nested StackPanel/Buttons:

Before

Method Mean Error StdDev Allocated
Remeasure 4.638 ms 0.0268 ms 0.0250 ms 76.86 KB

After

Method Mean Error StdDev Gen0 Allocated
Remeasure 2.843 ms 0.0231 ms 0.0216 ms 3.9063 76.85 KB

(Numbers are for Ryzen 9 5900X, Windows 11, x64)

Now for the micro-benchmarks:

Before

Method Mean Error StdDev
GetDefaultValues 10,260.6 ns 55.07 ns 51.51 ns
Get_Local_Values 4,219.3 ns 33.76 ns 31.58 ns
Get_Local_Values_With_Style_Values 4,080.2 ns 22.00 ns 19.50 ns
Method PropertyCount Mean Error StdDev
LookupProperties 2 3.219 ns 0.0590 ns 0.0551 ns
LookupProperties 6 14.549 ns 0.1533 ns 0.1359 ns
LookupProperties 10 30.965 ns 0.5022 ns 0.4698 ns
LookupProperties 20 77.665 ns 0.3909 ns 0.3656 ns
LookupProperties 30 131.474 ns 0.7703 ns 0.7205 ns

After

Method Mean Error StdDev
GetDefaultValues 1,178.9 ns 4.35 ns 4.07 ns
Get_Local_Values 2,685.4 ns 11.23 ns 8.77 ns
Get_Local_Values_With_Style_Values 2,710.9 ns 19.54 ns 18.27 ns
Method PropertyCount Mean Error StdDev
LookupProperties 2 2.183 ns 0.0150 ns 0.0133 ns
LookupProperties 6 10.633 ns 0.0432 ns 0.0383 ns
LookupProperties 10 21.902 ns 0.1005 ns 0.0940 ns
LookupProperties 20 56.363 ns 0.3171 ns 0.2967 ns
LookupProperties 30 95.471 ns 0.4494 ns 0.4204 ns

How was the solution implemented (if it's not obvious)?

There are two main gains here:

Default values

First, improving getting the default value was the main concern since that's the most common case. (Look at that almost 9x speedup!)

This is done by avoiding metadata lookup as much as possible. If we have only a single default value even for a property with multiple metadatas (AddOwner() calls), cache it in a field and return it asap.

If we still have several metadatas, lookup fast if possible. Type.IsInstanceOfType(obj) is about 25% faster for classes than Type.IsAssignableFrom(obj.GetType()) (.NET 8). A dictionary is used only in last resort.

A ReferenceEqualityComparer is used for this dictionary, resulting in a ≈20% speedup per lookup.

[MethodImpl(MethodImplOptions.AggressiveInlining)] has been used where it made sense to do so (always measured first).

Local values

Next was improving lookup for local values.

This is where micro-optimizations matter. AvaloniaPropertyDictionary.TryGetValue() is everywhere: every nanosecond counts here. Even if there's no local value for a property, we must always get through this path first. It has to be fast.

The binary search algorithm used here went through several iterations. I tried several versions of loop unrolling, unsafe code and SIMD, verifying performance numbers and checking the resulting x64 assembly code every time, ensuring there's no unneeded instruction. (The results may vary for other platforms, but I still expect a gain.)

It turns out that a very simple loop is still the fastest (see the LookupProperties benchmark above), with some extras such as a forced bounds check removal (that couldn't be elided naturally by the JIT). It's fast enough that the linear search part has been removed.

The binary search in FindEntry has been copied manually in TryGetValue, since even aggressive inlining wasn't enough to squeeze that last drop of performance out of the method.

The property's Id is stored directly inside the Entry, avoiding an indirection and probably allowing everything to fit inside a CPU cache line (the change was measurable in the LookupProperties benchmark).

Misc

Used ArgumentNullException.ThrowIfNull where possible, ensuring the throw part is never inlined, allowing the JIT to inline the caller if it chooses to do so.

Made Optional<T>.GetValueOrDefault() an unconditonal single field access, since _value will be default if there's no value (Nullable<T> has a similar implementation).

@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.2.999-cibuild0047238-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

Comment on lines 24 to 28
public bool Equals(Type? x, Type? y)
=> x == y;

public int GetHashCode(Type obj)
=> RuntimeHelpers.GetHashCode(obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using RuntimeTypeHandle instead would make any reasonable difference without going way too deep into microoptimizations.

Copy link
Member Author

@MrJul MrJul Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting.

This is the dictionary benchmark, where we can see that TypeComparer is faster than the default comparer (.NET 8, x64, Windows).
(typeof(Control) is inside the dictionary, typeof(Button) isn't).

Method Type Mean Error StdDev Ratio
DefaultComparer Button 5.976 ns 0.0498 ns 0.0466 ns 1.00
TypeComparer Button 4.930 ns 0.0339 ns 0.0317 ns 0.82
RuntimeTypeHandleComparer Button 5.737 ns 0.0381 ns 0.0338 ns 0.96
ReferenceComparer Button 4.940 ns 0.0435 ns 0.0407 ns 0.83
DefaultComparer Control 6.735 ns 0.0490 ns 0.0458 ns 1.00
TypeComparer Control 6.132 ns 0.0444 ns 0.0416 ns 0.91
RuntimeTypeHandleComparer Control 6.445 ns 0.0508 ns 0.0475 ns 0.96
ReferenceComparer Control 5.310 ns 0.0300 ns 0.0281 ns 0.79
Benchmark code
public class BenchsDictionaryType
{
    private readonly Dictionary<Type, object> _dic1 = CreateDictionary(null);
    private readonly Dictionary<Type, object> _dic2 = CreateDictionary(TypeEqualityComparer.Instance);
    private readonly Dictionary<Type, object> _dic3 = CreateDictionary(RuntimeTypeHandleEqualityComparer.Instance);
    private readonly Dictionary<Type, object> _dic4 = CreateDictionary(ReferenceEqualityComparer.Instance);

    private static Dictionary<Type, object> CreateDictionary(IEqualityComparer<Type>? comparer)
        => new(comparer)
        {
            [typeof(int)] = "abc",
            [typeof(string)] = "def",
            [typeof(Control)] = "ghi"
        };

    [Params(typeof(Button), typeof(Control))]
    public Type Type { get; set; }

    [Benchmark(Baseline = true)]
    public bool DefaultComparer()
        => _dic1.TryGetValue(Type, out _);

    [Benchmark]
    public bool TypeComparer()
        => _dic2.TryGetValue(Type, out _);

    [Benchmark]
    public bool RuntimeTypeHandleComparer()
        => _dic3.TryGetValue(Type, out _);

    [Benchmark]
    public bool ReferenceComparer()
        => _dic4.TryGetValue(Type, out _);
}

public sealed class TypeEqualityComparer : IEqualityComparer<Type>
{
    public static TypeEqualityComparer Instance { get; } = new();

    public bool Equals(Type? x, Type? y)
        => x == y;

    public int GetHashCode(Type obj)
        => RuntimeHelpers.GetHashCode(obj);
}


public sealed class RuntimeTypeHandleEqualityComparer : IEqualityComparer<Type>
{
    public static RuntimeTypeHandleEqualityComparer Instance { get; } = new();

    public bool Equals(Type? x, Type? y)
        => x.TypeHandle.Equals(y.TypeHandle);

    public int GetHashCode(Type obj)
        => obj.TypeHandle.GetHashCode();
}

Since TypeComparer was faster in the dictionary benchmark, I actually read the results of the Equals benchmark backwards!
You can see that == is in fact slower than Equals, despire the dictionary lookup being faster.

Equals:

Method Type1 Type2 Mean Error StdDev Ratio
Equals Button Button 0.5504 ns 0.0102 ns 0.0095 ns 1.00
Operator Button Button 1.0966 ns 0.0160 ns 0.0149 ns 1.99
TypeHandleEquals Button Button 0.4741 ns 0.0026 ns 0.0020 ns 0.86
ReferenceEquals Button Button 0.1434 ns 0.0023 ns 0.0022 ns 0.26
Equals Button StyledElement 0.5405 ns 0.0108 ns 0.0101 ns 1.00
Operator Button StyledElement 1.1051 ns 0.0170 ns 0.0159 ns 2.05
TypeHandleEquals Button StyledElement 0.4873 ns 0.0033 ns 0.0026 ns 0.91
ReferenceEquals Button StyledElement 0.1469 ns 0.0062 ns 0.0051 ns 0.27
Equals StyledElement Button 0.5308 ns 0.0122 ns 0.0114 ns 1.00
Operator StyledElement Button 1.1185 ns 0.0201 ns 0.0188 ns 2.11
TypeHandleEquals StyledElement Button 0.4847 ns 0.0101 ns 0.0095 ns 0.91
ReferenceEquals StyledElement Button 0.1475 ns 0.0037 ns 0.0033 ns 0.28
Equals StyledElement StyledElement 0.5278 ns 0.0093 ns 0.0087 ns 1.00
Operator StyledElement StyledElement 1.1040 ns 0.0175 ns 0.0164 ns 2.09
TypeHandleEquals StyledElement StyledElement 0.5013 ns 0.0118 ns 0.0110 ns 0.95
ReferenceEquals StyledElement StyledElement 0.1546 ns 0.0100 ns 0.0094 ns 0.29

The dictionary difference is probably due to the different GetHashCode implementation:

Method Type Mean Error StdDev Ratio
Normal Button 0.6236 ns 0.0108 ns 0.0101 ns 1.00
RuntimeHelper Button 0.4295 ns 0.0085 ns 0.0079 ns 0.69
Benchmark code
public class BenchsTypeEquals
{
    [Params(typeof(StyledElement), typeof(Button))]
    public Type Type1 { get; set; }

    [Params(typeof(StyledElement), typeof(Button))]
    public Type Type2 { get; set; }

    [Benchmark(Baseline = true)]
    public bool Equals()
        => Type1.Equals(Type2);

    [Benchmark]
    public bool Operator()
        => Type1 == Type2;

    [Benchmark]
    public bool TypeHandleEquals()
        => Type1.TypeHandle.Equals(Type2.TypeHandle);

    [Benchmark]
    public bool ReferenceEquals()
        => ReferenceEquals(Type1, Type2);
}

public class BenchsTypeGetHashCode
{
    [Params(typeof(Button))]
    public Type Type { get; set; }

    [Benchmark(Baseline = true)]
    public int Normal()
        => Type.GetHashCode();

    [Benchmark]
    public int RuntimeHelper()
        => RuntimeHelpers.GetHashCode(Type);
}

In every case, you can see that while RuntimeTypeHandle is a bit faster than the the default one, ReferenceEqualityComparer always wins, as we can't get much simpler than a simple comparison.

In the current TypeEqualityComparer implementation, I realize there's no point in not using ReferenceEquals() when GetHashCode() already uses the reference hashcode through RuntimeHelpers. Doing so would make it a ReferenceEqualityComparer. This doesn't handle non-RuntimeType types, but we don't really support that scenario.

I'll delete TypeEqualityComparer and use ReferenceEqualityComparer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a slight speedup improvement in Remeasure with the ReferenceEqualityComparer, I've updated the OP with the numbers.

Copy link
Member

@maxkatz6 maxkatz6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great to me.
But need @grokys opinion on possible behavioral changes.

@MrJul MrJul changed the title Improve AvaloniaProperty.GetValue performance Improve AvaloniaObject.GetValue performance Apr 13, 2024
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.2.999-cibuild0047262-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@maxkatz6 maxkatz6 requested a review from grokys April 13, 2024 23:19
@MrJul MrJul force-pushed the feature/getvalue-perf branch from 62c21a1 to db191d2 Compare April 17, 2024 13:49
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.2.999-cibuild0047372-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@jmacato jmacato enabled auto-merge April 23, 2024 12:30
@jmacato jmacato added this pull request to the merge queue Apr 23, 2024
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.2.999-cibuild0047589-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

Merged via the queue into AvaloniaUI:master with commit edceb96 Apr 23, 2024
8 of 10 checks passed
@MrJul MrJul deleted the feature/getvalue-perf branch April 23, 2024 18:02
@heku heku mentioned this pull request Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants