Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix offset + globalization issues in StringSegment #45022

Merged
merged 7 commits into from
Jan 23, 2021

Conversation

GrabYourPitchforks
Copy link
Member

Fixes #39140.

I also experimented with adding nullable annotations to this type to get some additional error checking, but I ended up reverting that commit because it was out of scope of this work.

Marked draft because I haven't run any tests whatsoever over this code. It's just a demonstration of how I'm thinking of solving the problem.

@ghost
Copy link

ghost commented Nov 20, 2020

Tagging subscribers to this area: @eerhardt, @maryamariyan
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #39140.

I also experimented with adding nullable annotations to this type to get some additional error checking, but I ended up reverting that commit because it was out of scope of this work.

Marked draft because I haven't run any tests whatsoever over this code. It's just a demonstration of how I'm thinking of solving the problem.

Author: GrabYourPitchforks
Assignees: -
Labels:

area-Extensions-Primitives

Milestone: -

@davidfowl
Copy link
Member

Performance tests!! 😄

@GrabYourPitchforks
Copy link
Member Author

@davidfowl There are lots of perf wins we could get from this type if we were willing to change some of its behavior. I leave that to a different issue.

@davidfowl
Copy link
Member

Yea, I just don't want regressions, this is in a hot path

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Nov 20, 2020

What methods specifically do you need optimized? The only one that is likely to regress is Equals(StringSegment), because it's now bouncing through an indirection. But I can hand-tune that if needed.

Reiterating my earlier comment: if this is really in a hot path, seriously consider allowing some behavioral changes here. For instance, the fact that there's special-casing all over the place to distinguish between default(StringSegment) and StringSegment.Empty leads to unnecessary complexity + slower perf. Allowing the two to be treated as equivalent would optimize things.

Allowing this change in behavior would make existing methods dumb wrappers around the already highly-optimized span versions. For example:

// Pretty much the whole call stack gets inlined into the caller at this point.
public bool Equals(StringSegment other) => this.AsSpan().SequenceEquals(other.AsSpan());

@davidfowl
Copy link
Member

I haven't looked deeply enough to tell if that change will have an impact m, I'm just saying these types are only used in ASP.NET Core so make sure you do some performance testing

@GrabYourPitchforks GrabYourPitchforks marked this pull request as ready for review January 21, 2021 01:58
@GrabYourPitchforks
Copy link
Member Author

@maryamariyan @eerhardt this is now ready for review. I've added a bunch of unit tests to check various edge cases, especially with respect to globalization. I've also added extra argument checks around the Equals and Contains fast-paths to mirror the checks that string.Equals performs in those same scenarios. Finally, all unsafe or unsafe-equivalent code has been removed from this type.


fixed (char* p = Buffer)
int i;
for (i = span.Length - 1; (uint)i < (uint)span.Length; i--)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(uint)i < (uint)span.Length

Is this micro optimization worth the mental exercise someone reading this code has to go through? Honestly, this might be the first time I've ever seen a loop written this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to elide the bounds check, but we could write it using standard i >= 0 syntax if desired. Last I checked, the JIT still emitted the bounds check, which would make the "simple" way a perf regression compared to the existing code.

Copy link
Member

@eerhardt eerhardt Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely not an expert here, so double-check my work to make sure I didn't do anything wrong. But the current JIT'd code looks worse to me.

UPDATE: I found my first mistake - it defaults to Debug. Updated the JIT'd code to Release and they look the same to me.

https://sharplab.io/#v2:C4LghgzgtgNAJiA1AHwAICYCMBYAUKgZgAJVMA2E9IgYSIG88imTjSKAjAe04BsiBZTAApSABiJgAlEUbMGuZoqIA3MACciEAA5gAdkQC8EgHQBBCAGUduoZIDcspUQCWu4C4cKnRAGacNQs6GmtbGADIAproA5sAAFkQAtESYdkRCAK6uwJJBADzpWW6S2nrhUbFxac6JiZKOSvLeSs4+6QDGcerGAJIQAOpxzsARVmDtEUKlugDazgC6kvVezUxNq06oAOxEwGoZEZ4bTAC+DYpnK0rbvmA8EIcNl80NhCTkRFy8AugimOJSGRXdZOVQaabBMBmSzWWxHbzZDznZh+AJBIzTcoxeJJFLVIgAPiMomqtWWzRBq1aHS6al6AyGIzGEym1jmi3Jx0pxxIOz2B3hG2e3mFmx2PjuD0FREuJyAA

    public static bool M1(string a) 
    {
        var span = a.AsSpan();
        int i;
        for (i = span.Length - 1; (uint)i < (uint)span.Length; i--)
        {
            if (char.IsWhiteSpace(span[i]))
            {
                return true;
            }
        }
        return false;
    }
        
    public static bool M2(string a) 
    {
        var span = a.AsSpan();
        int i;
        for (i = span.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(span[i]))
            {
                return true;
            }
        }
        return false;
    }

The left side is M1 and the right is M2:
image

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's x86 debug output. I'm trying to get x64 release output but it'll take a few minutes.

Edit: Looks like x64 release is nearly identical between the two cases. I'll do the simple thing.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment on code that I found surprising.

LGTM. Nice work on filling out the unit tests for this type.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Jan 22, 2021

@bbartels @eerhardt thanks both for your feedback! I've incorporated both of your suggestions and will merge after CI completes.

@GrabYourPitchforks
Copy link
Member Author

I had forgotten that netfx (and netcore before 5.0) have some bugs w.r.t. the globalization IsPrefix and IsSuffix methods. I've suppressed on netfx the test cases which exercise those edge conditions.

@GrabYourPitchforks
Copy link
Member Author

Opened #47374 to track CI failures. Continuing merge since CI failures are unrelated.

@GrabYourPitchforks GrabYourPitchforks merged commit 19df292 into dotnet:master Jan 23, 2021
@GrabYourPitchforks GrabYourPitchforks deleted the stringsegment branch January 23, 2021 19:40
@davidfowl
Copy link
Member

davidfowl commented Jan 24, 2021

@GrabYourPitchforks did you do any performance testing?

@sebastienros Lets keep an eye out here for regressions since ASP.NET Core is the only consumer of this.

@GrabYourPitchforks
Copy link
Member Author

@davidfowl No formal benchmarking, but I looked at the codegen for hot methods like Equals and verified that things appear as expected. E2E testing will be more useful than microbenchmarking here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringSegment.EndsWith inconsistent (buggy?) behavior
4 participants