Fix and optimize EscapeUnescapeIri #32025

MihaZupan · 2020-02-10T14:35:37Z

Allocate the 4-byte buffer on the stack rather than on the heap.

Perf for "scheme:" + { '\ud83f', '\udffe' } * 1000 (same input as in #31860)

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	351.7 us	1.06	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	330.8 us	1.00	54.1992	-	-	222.94 KB

src/libraries/System.Private.Uri/src/System/IriHelper.cs

stephentoub · 2020-02-10T16:32:45Z

src/libraries/System.Private.Uri/src/System/IriHelper.cs

-                        }
+                    for (int count = 0; count < encodedBytesCount; ++count)
+                    {
+                        UriHelper.EscapeAsciiChar((char)*(pEncodedBytes + count), ref dest);


Can we use this as an opportunity to get rid of some unsafe code and just use spans? e.g.

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded]; int encodedBytesCount = Encoding.UTF8.GetBytes(new ReadOnlySpan<byte>(pInput + next, surrogatePair ? 2 : 1), encodedBytes); for (int count = 0; count < encodedBytesCount; count++) { UriHelper.EscapeAsciiChar((char)encodedBytes[i], ref dest); }

I see a ~2% perf hit on the benchmark by doing so.
If we're okay with that I can make the change.

2% might be a measurement noise. I just ran in the same issue last week where the perf deviated by +- 2% between runs without any code changes.

What is the microbenchmark?

Try running perf test multiple time and measure deviation. Additionally, you can set CPU affinity for the benchmark process it can stabilize results a bit.

I extracted just the needed files to https://github.com/MihaZupan/BenchmarkPR32025

Oh, you're not actually measuring System.Private.Uri, but rather copying the source out into the benchmark? That's not going to be equivalent. For example, we explicitly clear the localsinit flag for all framework assemblies, but that won't happen for your code compiled into your benchmark, which means things like stackalloc are going to be more expensive.

Encoding.UTF8.GetBytes is a bit heavyweight for this. I'd instead recommend a slight variation of what @scalablecory recommended:

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded]; Rune rune = (surrogatePair) ? new Rune(pInput[next], pInput[next + 1]) : new Rune(pInput[next]); int encodedBytesCount = rune.EncodeToUtf8(encodedBytes); encodedBytes = encodedBytes.Slice(0, encodedBytesCount); foreach (byte b in encodedBytes) { UriHelper.EscapeAsciiChar((char)b, ref dest); }

Removing localsinit reduces the gap substantially.
Using @GrabYourPitchforks 's approach beats all above 👍

Method Mean Error StdDev

Unsafe 33.68 us 0.401 us 0.356 us

Span 35.34 us 0.675 us 0.853 us

SpanSlice 35.23 us 1.043 us 0.871 us

Rune 28.13 us 0.547 us 0.512 us

I'll make the change.

To satisfy my curiosity, I added a benchmark with Rune.EncodeToUtf8 encoding to a byte*, avoiding spans. It performs ~8% better than the Rune benchmark above.
(I am not saying that I prefer it over the Rune & Span based one)

GrabYourPitchforks · 2020-02-10T18:51:33Z

The CheckIriUnicodeRange method also performs an unnecessary allocation. That could be addressed (and the logic simplified greatly) via something akin to the following untested code:

// This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2
internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery)
{
    bool inRange = false;
    surrogatePair = false;

    Debug.Assert(char.IsHighSurrogate(highSurr));

    if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
    {
        surrogatePair = true;

        // U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
        // U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
        // U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).

        inRange = ((ushort)rune.Value < 0xFFFE)
            && ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
            && (isQuery || rune.Value < 0xF0000);
    }

    return inRange;
}

MihaZupan · 2020-02-10T18:54:29Z

@GrabYourPitchforks Can you comment on #31860 regarding CheckIriUnicodeRange? Are you saying that the majority of those range checks are not needed?

GrabYourPitchforks · 2020-02-10T18:57:18Z

Sorry, didn't see the other issue. Will copy the comment there. And yes, the majority of the checks are unnecessary.

MihaZupan · 2020-02-13T12:48:42Z

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results and hitting the fallback-path in Utf8Encoding (that allocates).

Correcting the bug and using Rune now shows much nicer numbers

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	315.9 us	2.47	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	127.8 us	1.00	19.0430	2.6855	-	78.41 KB

This also makes the improvement in #31860 more noticable. Combining the changes the numbers are:

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	315.94 us	4.19	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	75.56 us	1.00	11.4746	1.5869	-	47.16 KB

The time will likely improve a bit more when applying the change to range checks in #31860.

MihaZupan · 2020-02-13T13:16:33Z

@dotnet/ncl @stephentoub Please re-review

scalablecory

The code looks good.

src/libraries/System.Private.Uri/src/System/IriHelper.cs

davidsh · 2020-02-13T15:43:47Z

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results

Did the CI tests not detect this regression? If there weren't any tests for this condition, will you add new tests to verify the correct behavior to avoid future regressions?

MihaZupan · 2020-02-13T15:47:20Z

@davidsh The tests I added will catch this as well.
For example, this test will return %F0%9F%BF%BE%EF%BF%BD instead of %F0%9F%BF%BE (note the extra %EF%BF%BD at the end - percent encoded replacement char).

It appears there were no tests with a surrogate pair that wasn't in the IRI range before.

MihaZupan · 2020-02-13T17:14:26Z

Test failures are unrelated

scalablecory

Looks good.

Also, wow -- check out the surrogate version of CheckIriUnicodeRange. It is bonkers!

jkotas · 2020-02-15T15:16:48Z

Test failures are unrelated

Can you please add links to the existing issues next time? See https://github.com/dotnet/runtime/blob/master/docs/pr-guide.md#unrelated-failure .

This is changing Uri, the failing tests on OSX are Uri tests that did not fail for a long time, and the Uri test failure is hitting all PRs now. I am going to revert this PR to see whether it fixes the CI.

This reverts commit dda29ff.

jkotas · 2020-02-15T15:21:33Z

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

EgorBo · 2020-02-15T15:24:11Z

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

Can static analyzers catch such cases?

jkotas · 2020-02-15T15:30:42Z

Can static analyzers catch such cases?

Good idea. Added a note to #30740

MihaZupan added 2 commits February 10, 2020 15:28

Remove byte[] allocation per encoded character

dc4073c

Remove dead code from EscapeUnescapeIri

7a16bd4

MihaZupan requested review from stephentoub and a team February 10, 2020 14:35

MihaZupan added the area-System.Net label Feb 10, 2020

MihaZupan added this to the 5.0 milestone Feb 10, 2020

MihaZupan changed the title ~~Uri cleanup bytearray alloc~~ Remove byte[] allocation per encoded character in Uri Feb 10, 2020

EgorBo reviewed Feb 10, 2020

View reviewed changes

src/libraries/System.Private.Uri/src/System/IriHelper.cs Outdated Show resolved Hide resolved

Use int instead of IntPtr for stack buffer

421c586

stephentoub reviewed Feb 10, 2020

View reviewed changes

src/libraries/System.Private.Uri/src/System/IriHelper.cs Outdated Show resolved Hide resolved

Use sizeof(int) instead of 4 as const

29b9f96

stephentoub reviewed Feb 10, 2020

View reviewed changes

Fix EscapeUnescapeIri for escaped surrogate pairs

814ebcc

MihaZupan changed the title ~~Remove byte[] allocation per encoded character in Uri~~ Fix and optimize EscapeUnescapeIri Feb 13, 2020

Merge master

f5097dc

MihaZupan requested review from alnikola, scalablecory and stephentoub February 13, 2020 13:17

scalablecory reviewed Feb 13, 2020

View reviewed changes

src/libraries/System.Private.Uri/src/System/IriHelper.cs Show resolved Hide resolved

scalablecory approved these changes Feb 13, 2020

View reviewed changes

MihaZupan merged commit dda29ff into dotnet:master Feb 15, 2020

jkotas added a commit that referenced this pull request Feb 15, 2020

Revert "Fix and optimize EscapeUnescapeIri (#32025)"

e3dd082

This reverts commit dda29ff.

jkotas mentioned this pull request Feb 15, 2020

Revert "Fix and optimize EscapeUnescapeIri" #32374

Closed

jkotas mentioned this pull request Feb 15, 2020

System.Private.Uri.Functional.Tests failed with StackOverflow on OSX #32367

Closed

stephentoub mentioned this pull request Feb 15, 2020

Static analysis for .NET 5 #30740

Closed

49 tasks

terrajobst mentioned this pull request Mar 19, 2020

Do not use stackalloc inside of a loop #33782

Closed

MihaZupan mentioned this pull request Apr 13, 2020

[Uri] A lightweight alternative for System.Uri #34873

Closed

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and optimize EscapeUnescapeIri #32025

Fix and optimize EscapeUnescapeIri #32025

MihaZupan commented Feb 10, 2020 •

edited

Loading

stephentoub Feb 10, 2020

MihaZupan Feb 10, 2020

alnikola Feb 10, 2020

stephentoub Feb 10, 2020

alnikola Feb 10, 2020

MihaZupan Feb 10, 2020

stephentoub Feb 10, 2020

GrabYourPitchforks Feb 10, 2020

MihaZupan Feb 10, 2020

MihaZupan Feb 10, 2020

GrabYourPitchforks commented Feb 10, 2020

MihaZupan commented Feb 10, 2020

GrabYourPitchforks commented Feb 10, 2020

MihaZupan commented Feb 13, 2020 •

edited

Loading

MihaZupan commented Feb 13, 2020

scalablecory left a comment

davidsh commented Feb 13, 2020

MihaZupan commented Feb 13, 2020 •

edited

Loading

MihaZupan commented Feb 13, 2020

scalablecory left a comment

jkotas commented Feb 15, 2020 •

edited

Loading

jkotas commented Feb 15, 2020

EgorBo commented Feb 15, 2020

jkotas commented Feb 15, 2020

Method	Mean	Error	StdDev
Unsafe	33.68 us	0.401 us	0.356 us
Span	35.34 us	0.675 us	0.853 us
SpanSlice	35.23 us	1.043 us	0.871 us
Rune	28.13 us	0.547 us	0.512 us

Fix and optimize EscapeUnescapeIri #32025

Fix and optimize EscapeUnescapeIri #32025

Conversation

MihaZupan commented Feb 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Feb 10, 2020

MihaZupan commented Feb 10, 2020

GrabYourPitchforks commented Feb 10, 2020

MihaZupan commented Feb 13, 2020 • edited Loading

MihaZupan commented Feb 13, 2020

scalablecory left a comment

Choose a reason for hiding this comment

davidsh commented Feb 13, 2020

MihaZupan commented Feb 13, 2020 • edited Loading

MihaZupan commented Feb 13, 2020

scalablecory left a comment

Choose a reason for hiding this comment

jkotas commented Feb 15, 2020 • edited Loading

jkotas commented Feb 15, 2020

EgorBo commented Feb 15, 2020

jkotas commented Feb 15, 2020

MihaZupan commented Feb 10, 2020 •

edited

Loading

MihaZupan commented Feb 13, 2020 •

edited

Loading

MihaZupan commented Feb 13, 2020 •

edited

Loading

jkotas commented Feb 15, 2020 •

edited

Loading