Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and optimize EscapeUnescapeIri #32025

Merged
merged 6 commits into from
Feb 15, 2020

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Feb 10, 2020

Allocate the 4-byte buffer on the stack rather than on the heap.

Perf for "scheme:" + { '\ud83f', '\udffe' } * 1000 (same input as in #31860)

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
NewUri clean\CoreRun.exe 351.7 us 1.06 69.3359 13.6719 - 285.44 KB
NewUri new\CoreRun.exe 330.8 us 1.00 54.1992 - - 222.94 KB

@MihaZupan MihaZupan requested review from stephentoub and a team February 10, 2020 14:35
@MihaZupan MihaZupan added this to the 5.0 milestone Feb 10, 2020
@MihaZupan MihaZupan changed the title Uri cleanup bytearray alloc Remove byte[] allocation per encoded character in Uri Feb 10, 2020
}
for (int count = 0; count < encodedBytesCount; ++count)
{
UriHelper.EscapeAsciiChar((char)*(pEncodedBytes + count), ref dest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this as an opportunity to get rid of some unsafe code and just use spans? e.g.

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];
int encodedBytesCount = Encoding.UTF8.GetBytes(new ReadOnlySpan<byte>(pInput + next, surrogatePair ? 2 : 1), encodedBytes);
for (int count = 0; count < encodedBytesCount; count++)
{
    UriHelper.EscapeAsciiChar((char)encodedBytes[i], ref dest);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a ~2% perf hit on the benchmark by doing so.
If we're okay with that I can make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2% might be a measurement noise. I just ran in the same issue last week where the perf deviated by +- 2% between runs without any code changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the microbenchmark?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running perf test multiple time and measure deviation. Additionally, you can set CPU affinity for the benchmark process it can stabilize results a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted just the needed files to https://github.com/MihaZupan/BenchmarkPR32025

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're not actually measuring System.Private.Uri, but rather copying the source out into the benchmark? That's not going to be equivalent. For example, we explicitly clear the localsinit flag for all framework assemblies, but that won't happen for your code compiled into your benchmark, which means things like stackalloc are going to be more expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding.UTF8.GetBytes is a bit heavyweight for this. I'd instead recommend a slight variation of what @scalablecory recommended:

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];

Rune rune = (surrogatePair) ? new Rune(pInput[next], pInput[next + 1]) : new Rune(pInput[next]);
int encodedBytesCount = rune.EncodeToUtf8(encodedBytes);
encodedBytes = encodedBytes.Slice(0, encodedBytesCount);

foreach (byte b in encodedBytes)
{
    UriHelper.EscapeAsciiChar((char)b, ref dest);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing localsinit reduces the gap substantially.
Using @GrabYourPitchforks 's approach beats all above 👍

Method Mean Error StdDev
Unsafe 33.68 us 0.401 us 0.356 us
Span 35.34 us 0.675 us 0.853 us
SpanSlice 35.23 us 1.043 us 0.871 us
Rune 28.13 us 0.547 us 0.512 us

I'll make the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To satisfy my curiosity, I added a benchmark with Rune.EncodeToUtf8 encoding to a byte*, avoiding spans. It performs ~8% better than the Rune benchmark above.
(I am not saying that I prefer it over the Rune & Span based one)

@GrabYourPitchforks
Copy link
Member

The CheckIriUnicodeRange method also performs an unnecessary allocation. That could be addressed (and the logic simplified greatly) via something akin to the following untested code:

// This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2
internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery)
{
    bool inRange = false;
    surrogatePair = false;

    Debug.Assert(char.IsHighSurrogate(highSurr));

    if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
    {
        surrogatePair = true;

        // U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
        // U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
        // U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).

        inRange = ((ushort)rune.Value < 0xFFFE)
            && ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
            && (isQuery || rune.Value < 0xF0000);
    }

    return inRange;
}

@MihaZupan
Copy link
Member Author

@GrabYourPitchforks Can you comment on #31860 regarding CheckIriUnicodeRange? Are you saying that the majority of those range checks are not needed?

@GrabYourPitchforks
Copy link
Member

Sorry, didn't see the other issue. Will copy the comment there. And yes, the majority of the checks are unnecessary.

@MihaZupan MihaZupan changed the title Remove byte[] allocation per encoded character in Uri Fix and optimize EscapeUnescapeIri Feb 13, 2020
@MihaZupan
Copy link
Member Author

MihaZupan commented Feb 13, 2020

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results and hitting the fallback-path in Utf8Encoding (that allocates).

Correcting the bug and using Rune now shows much nicer numbers

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
NewUri clean\CoreRun.exe 315.9 us 2.47 69.3359 13.6719 - 285.44 KB
NewUri new\CoreRun.exe 127.8 us 1.00 19.0430 2.6855 - 78.41 KB

This also makes the improvement in #31860 more noticable. Combining the changes the numbers are:

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
NewUri clean\CoreRun.exe 315.94 us 4.19 69.3359 13.6719 - 285.44 KB
NewUri new\CoreRun.exe 75.56 us 1.00 11.4746 1.5869 - 47.16 KB

The time will likely improve a bit more when applying the change to range checks in #31860.

@MihaZupan
Copy link
Member Author

@dotnet/ncl @stephentoub Please re-review

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good.

@davidsh
Copy link
Contributor

davidsh commented Feb 13, 2020

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results

Did the CI tests not detect this regression? If there weren't any tests for this condition, will you add new tests to verify the correct behavior to avoid future regressions?

@MihaZupan
Copy link
Member Author

MihaZupan commented Feb 13, 2020

@davidsh The tests I added will catch this as well.
For example, this test will return %F0%9F%BF%BE%EF%BF%BD instead of %F0%9F%BF%BE (note the extra %EF%BF%BD at the end - percent encoded replacement char).

It appears there were no tests with a surrogate pair that wasn't in the IRI range before.

@MihaZupan
Copy link
Member Author

Test failures are unrelated

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Also, wow -- check out the surrogate version of CheckIriUnicodeRange. It is bonkers!

@MihaZupan MihaZupan merged commit dda29ff into dotnet:master Feb 15, 2020
@jkotas
Copy link
Member

jkotas commented Feb 15, 2020

Test failures are unrelated

Can you please add links to the existing issues next time? See https://github.com/dotnet/runtime/blob/master/docs/pr-guide.md#unrelated-failure .

This is changing Uri, the failing tests on OSX are Uri tests that did not fail for a long time, and the Uri test failure is hitting all PRs now. I am going to revert this PR to see whether it fixes the CI.

jkotas added a commit that referenced this pull request Feb 15, 2020
@jkotas
Copy link
Member

jkotas commented Feb 15, 2020

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

@EgorBo
Copy link
Member

EgorBo commented Feb 15, 2020

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

Can static analyzers catch such cases?

@jkotas
Copy link
Member

jkotas commented Feb 15, 2020

Can static analyzers catch such cases?

Good idea. Added a note to #30740

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants