-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and optimize EscapeUnescapeIri #32025
Fix and optimize EscapeUnescapeIri #32025
Conversation
} | ||
for (int count = 0; count < encodedBytesCount; ++count) | ||
{ | ||
UriHelper.EscapeAsciiChar((char)*(pEncodedBytes + count), ref dest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this as an opportunity to get rid of some unsafe code and just use spans? e.g.
Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];
int encodedBytesCount = Encoding.UTF8.GetBytes(new ReadOnlySpan<byte>(pInput + next, surrogatePair ? 2 : 1), encodedBytes);
for (int count = 0; count < encodedBytesCount; count++)
{
UriHelper.EscapeAsciiChar((char)encodedBytes[i], ref dest);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a ~2% perf hit on the benchmark by doing so.
If we're okay with that I can make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2% might be a measurement noise. I just ran in the same issue last week where the perf deviated by +- 2% between runs without any code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the microbenchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try running perf test multiple time and measure deviation. Additionally, you can set CPU affinity for the benchmark process it can stabilize results a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted just the needed files to https://github.com/MihaZupan/BenchmarkPR32025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're not actually measuring System.Private.Uri, but rather copying the source out into the benchmark? That's not going to be equivalent. For example, we explicitly clear the localsinit flag for all framework assemblies, but that won't happen for your code compiled into your benchmark, which means things like stackalloc are going to be more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding.UTF8.GetBytes
is a bit heavyweight for this. I'd instead recommend a slight variation of what @scalablecory recommended:
Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];
Rune rune = (surrogatePair) ? new Rune(pInput[next], pInput[next + 1]) : new Rune(pInput[next]);
int encodedBytesCount = rune.EncodeToUtf8(encodedBytes);
encodedBytes = encodedBytes.Slice(0, encodedBytesCount);
foreach (byte b in encodedBytes)
{
UriHelper.EscapeAsciiChar((char)b, ref dest);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing localsinit reduces the gap substantially.
Using @GrabYourPitchforks 's approach beats all above 👍
Method | Mean | Error | StdDev |
---|---|---|---|
Unsafe | 33.68 us | 0.401 us | 0.356 us |
Span | 35.34 us | 0.675 us | 0.853 us |
SpanSlice | 35.23 us | 1.043 us | 0.871 us |
Rune | 28.13 us | 0.547 us | 0.512 us |
I'll make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To satisfy my curiosity, I added a benchmark with Rune.EncodeToUtf8
encoding to a byte*
, avoiding spans. It performs ~8% better than the Rune benchmark above.
(I am not saying that I prefer it over the Rune & Span based one)
The // This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2
internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery)
{
bool inRange = false;
surrogatePair = false;
Debug.Assert(char.IsHighSurrogate(highSurr));
if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
{
surrogatePair = true;
// U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
// U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
// U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).
inRange = ((ushort)rune.Value < 0xFFFE)
&& ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
&& (isQuery || rune.Value < 0xF0000);
}
return inRange;
} |
@GrabYourPitchforks Can you comment on #31860 regarding |
Sorry, didn't see the other issue. Will copy the comment there. And yes, the majority of the checks are unnecessary. |
Turns out Correcting the bug and using Rune now shows much nicer numbers
This also makes the improvement in #31860 more noticable. Combining the changes the numbers are:
The time will likely improve a bit more when applying the change to range checks in #31860. |
@dotnet/ncl @stephentoub Please re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good.
Did the CI tests not detect this regression? If there weren't any tests for this condition, will you add new tests to verify the correct behavior to avoid future regressions? |
Test failures are unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Also, wow -- check out the surrogate version of CheckIriUnicodeRange
. It is bonkers!
Can you please add links to the existing issues next time? See https://github.com/dotnet/runtime/blob/master/docs/pr-guide.md#unrelated-failure . This is changing Uri, the failing tests on OSX are Uri tests that did not fail for a long time, and the Uri test failure is hitting all PRs now. I am going to revert this PR to see whether it fixes the CI. |
I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests. |
Can static analyzers catch such cases? |
Good idea. Added a note to #30740 |
Allocate the 4-byte buffer on the stack rather than on the heap.
Perf for
"scheme:" + { '\ud83f', '\udffe' } * 1000
(same input as in #31860)