Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and optimize EscapeUnescapeIri #32025

Merged
merged 6 commits into from
Feb 15, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 8 additions & 13 deletions src/libraries/System.Private.Uri/src/System/IriHelper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,6 @@ internal static unsafe string EscapeUnescapeIri(char* pInput, int start, int end
ValueStringBuilder dest = new ValueStringBuilder(size);
byte[]? bytes = null;

const int percentEncodingLen = 3; // Escaped UTF-8 will take 3 chars: %AB.
int bufferRemaining = 0;

int next = start;
char ch;
bool escape = false;
Expand Down Expand Up @@ -263,18 +260,16 @@ internal static unsafe string EscapeUnescapeIri(char* pInput, int start, int end
{
const int MaxNumberOfBytesEncoded = 4;

byte[] encodedBytes = new byte[MaxNumberOfBytesEncoded];
fixed (byte* pEncodedBytes = &encodedBytes[0])
{
int encodedBytesCount = Encoding.UTF8.GetBytes(pInput + next, surrogatePair ? 2 : 1, pEncodedBytes, MaxNumberOfBytesEncoded);
Debug.Assert(encodedBytesCount <= MaxNumberOfBytesEncoded, "UTF8 encoder should not exceed specified byteCount");
Debug.Assert(sizeof(IntPtr) >= MaxNumberOfBytesEncoded);
IntPtr encodedBytesBuffer;
MihaZupan marked this conversation as resolved.
Show resolved Hide resolved
byte* pEncodedBytes = (byte*)&encodedBytesBuffer;

bufferRemaining -= encodedBytesCount * percentEncodingLen;
int encodedBytesCount = Encoding.UTF8.GetBytes(pInput + next, surrogatePair ? 2 : 1, pEncodedBytes, MaxNumberOfBytesEncoded);
Debug.Assert(encodedBytesCount <= MaxNumberOfBytesEncoded, "UTF8 encoder should not exceed specified byteCount");

for (int count = 0; count < encodedBytesCount; ++count)
{
UriHelper.EscapeAsciiChar((char)encodedBytes[count], ref dest);
}
for (int count = 0; count < encodedBytesCount; ++count)
{
UriHelper.EscapeAsciiChar((char)*(pEncodedBytes + count), ref dest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this as an opportunity to get rid of some unsafe code and just use spans? e.g.

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];
int encodedBytesCount = Encoding.UTF8.GetBytes(new ReadOnlySpan<byte>(pInput + next, surrogatePair ? 2 : 1), encodedBytes);
for (int count = 0; count < encodedBytesCount; count++)
{
    UriHelper.EscapeAsciiChar((char)encodedBytes[i], ref dest);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a ~2% perf hit on the benchmark by doing so.
If we're okay with that I can make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2% might be a measurement noise. I just ran in the same issue last week where the perf deviated by +- 2% between runs without any code changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the microbenchmark?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running perf test multiple time and measure deviation. Additionally, you can set CPU affinity for the benchmark process it can stabilize results a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted just the needed files to https://github.com/MihaZupan/BenchmarkPR32025

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're not actually measuring System.Private.Uri, but rather copying the source out into the benchmark? That's not going to be equivalent. For example, we explicitly clear the localsinit flag for all framework assemblies, but that won't happen for your code compiled into your benchmark, which means things like stackalloc are going to be more expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding.UTF8.GetBytes is a bit heavyweight for this. I'd instead recommend a slight variation of what @scalablecory recommended:

Span<byte> encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];

Rune rune = (surrogatePair) ? new Rune(pInput[next], pInput[next + 1]) : new Rune(pInput[next]);
int encodedBytesCount = rune.EncodeToUtf8(encodedBytes);
encodedBytes = encodedBytes.Slice(0, encodedBytesCount);

foreach (byte b in encodedBytes)
{
    UriHelper.EscapeAsciiChar((char)b, ref dest);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing localsinit reduces the gap substantially.
Using @GrabYourPitchforks 's approach beats all above 👍

Method Mean Error StdDev
Unsafe 33.68 us 0.401 us 0.356 us
Span 35.34 us 0.675 us 0.853 us
SpanSlice 35.23 us 1.043 us 0.871 us
Rune 28.13 us 0.547 us 0.512 us

I'll make the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To satisfy my curiosity, I added a benchmark with Rune.EncodeToUtf8 encoding to a byte*, avoiding spans. It performs ~8% better than the Rune benchmark above.
(I am not saying that I prefer it over the Rune & Span based one)

}
}
}
Expand Down