Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid closure allocations when applying hit object results #26694

Merged
merged 9 commits into from
Feb 6, 2024

Conversation

chandler14362
Copy link
Contributor

I used DPA to profile memory allocations in a release build on my windows machine. The full process was loading up the game, searching for "justadice", and then playing https://osu.ppy.sh/beatmapsets/983942#osu/2058789. Results showed DrawableHitCircle.CheckForResult could perform a closure allocation up to 7.7MB in size.
rider64_2024-01-24_11-10-53

It would take further analysis to figure out why this closure allocation is so large, however, it can be easily mitigated implementing the following lambda calling pattern:

The concept of the pattern is to use the stack to pass necessary state to the callback instead of performing a closure allocation on the heap. I modified DrawableHitObject.ApplyResult(Action<JudgementResult> application) to be ApplyResult<TState>(Action<JudgementResult, TState> application, TState state). I also changed all instances of ApplyResult to use a static lambda. This enforces a closure allocation will never occur with any future changes (as long as the lambda is kept static). If something is referenced out of scope of the lambda, it will result in a compile error. I also made sure to include a non generic ApplyResult(Action<JudgementResult> application) for calling code that doesn't require use of state within the callback. This non-generic calls the generic with a null object.

After making these changes the closure allocation is gone (as it should be):
rider64_2024-01-24_11-36-37

These allocations could really add up considering they are per hit circle. My motivation for this change was I was still experiencing, small, yet infrequent, gameplay hitches on b272d34.

@bdach
Copy link
Collaborator

bdach commented Jan 24, 2024

There is likely some merit to this change, albeit to call this "using the stack to pass state" in the title is nigh obfuscatory.

@bdach bdach changed the title Use stack to pass state when applying hit object results Avoid closure allocations when applying hit object results Jan 24, 2024
@peppy
Copy link
Sponsor Member

peppy commented Jan 25, 2024

Has been on my radar but this fix is subpar because it will be a gotcha for any future calls to the method and cannot be enforced.

@chandler14362
Copy link
Contributor Author

chandler14362 commented Jan 25, 2024

Not sure how'd you enforce this within the code base. Maybe you can write some custom code analysis using roslyn but I seriously think that is over the top for this. I personally think code review combined with the documentation provided by bdach is enough. Not sure what your plans are for the existing rulesets but it was under my assumption these are mostly complete and not subject to much more change. I'd want to make sure the users creating rulesets use the method properly more than anything.

The other solution is to get rid of the lambda entirely and rework the code, that way there is no closure to worry about. I view this code as an integral part of osu so I feel like that's not within my territory to do.

@peppy
Copy link
Sponsor Member

peppy commented Jan 25, 2024

Yep exactly, the API is bad from the start. I'm fine with this as a stop-gap fix for the issue, but it definitely needs a rethink.

peppy
peppy previously requested changes Jan 25, 2024
Copy link
Sponsor Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as proposed.

Copy link
Sponsor Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed some further changes, will wait on more feedback.

@smoogipoo
Copy link
Contributor

By the way, the issue here is that the closure object is allocated for every call to this method by Update(), regardless of whether or not the ApplyResult() part is reached.

i.e. this is another reoccurrence of dotnet/runtime#40009

At this point I'm not particularly happy with the complexity added by the solutions above.

@chandler14362
Copy link
Contributor Author

chandler14362 commented Feb 3, 2024

Wow! Just checked that github issue you linked @smoogipoo and decided to run some basic code through sharplab.io:

using System;
public class C {
    public void Test(Action<string> x) {
        x.Invoke("test");
    }
    
    public void M(bool a) {
        var x = 0;
        if (a) {
            Test((arg) => {
                Console.WriteLine($"{arg}, {x + 1}");
            });
        }
        else {
            Console.WriteLine("no closure");
        }
    }
}

This code results in IL that shows a closure allocation before the if statement:

    .method public hidebysig 
        instance void M (
            bool a
        ) cil managed 
    {
        // Method begins at RVA 0x2060
        // Code size 56 (0x38)
        .maxstack 3
        .locals init (
            [0] class C/'<>c__DisplayClass1_0' 'CS$<>8__locals0',
            [1] bool
        )

        // sequence point: hidden
        IL_0000: newobj instance void C/'<>c__DisplayClass1_0'::.ctor()
        IL_0005: stloc.0

However, when moving x within the if statement that would create the closure:

        if (a) {
            var x = 0;
            Test((arg) => {
                Console.WriteLine($"{arg}, {x + 1}");
            });
        }

The IL generated is:

   .method public hidebysig 
        instance void M (
            bool a
        ) cil managed 
    {
        // Method begins at RVA 0x2060
        // Code size 56 (0x38)
        .maxstack 3
        .locals init (
            [0] bool,
            [1] class C/'<>c__DisplayClass1_0' 'CS$<>8__locals0'
        )

        IL_0000: nop
        IL_0001: ldarg.1
        IL_0002: stloc.0
        // sequence point: hidden
        IL_0003: ldloc.0
        IL_0004: brfalse.s IL_002a

        // sequence point: hidden
        IL_0006: newobj instance void C/'<>c__DisplayClass1_0'::.ctor()

Based off these results, it looks like the closure is only allocated at the start of the method if it is accessing local variables outside of the scope of its branch. Very interesting stuff.

As someone who likes to grind osu, I really hope the team would be willing to add slight complexity to the code in favor of performance. The dotnet issue has been open for over 3 years and still nothing has been done. When I look at at the changes @peppy has proposed in his diff, I think they are improvement on the existing base (regardless if the TState API is included or not).

@smoogipoo
Copy link
Contributor

I think I'm probably okay with @peppy 's diff above. It looks pretty over-engineered but I don't see a better way to do it (other than hacking around it).

@bdach bdach requested a review from smoogipoo February 5, 2024 12:43
@bdach bdach dismissed peppy’s stale review February 5, 2024 12:43

i'd say resolved

@smoogipoo smoogipoo merged commit c18cd65 into ppy:master Feb 6, 2024
13 of 17 checks passed
andy840119 added a commit to andy840119/karaoke that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants