Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Improve Uri.UnescapeDataString codepath #16456

Closed
wants to merge 4 commits into from

Conversation

safern
Copy link
Member

@safern safern commented Feb 25, 2017

Work for #16040

This improvement is an initial proposal to avoid char[] allocations done in Uri.UnescapeDataString codepath specifically under UriHelper.UnescapeString. This as well reflects to be faster in a local mini-test I wrote source code here -- and also saw some improvements in inclusive and exclusive time in an asp.net test scenario which you can find here.

The main improvements is that now I'm not seeing any char[] allocations in the GC Heap in any of the scenarios coming from the Uri.UnescapeDataString codepath and a little improvement in execution time.

Here is the mini-test I wrote PerfView traces result:

CPU Stacks

Before My Changes

image

After My Changes

image

In this test scenario we can see that there is a bigger difference inclusive and exclusive time spent in Uri.UnescapeDataString

GC Heap

Before My Changes

image

  • Callers to Char[]

image

After My Changes

image

  • Callers to Char[]

image

As we can see in this scenarios Uri.UnescapeDataString is allocating 0 Char[] and it reduced the allocations by 50% from 34,000 to 17,000.

Here are the results for the asp.net test scenario:

CPU Stacks

Before My Changes

image

After My Changes

image

As we can see after the changes Uri.UnescapeDataString went down on both exclusive (0.4%) and inclusive (0.1%) time.

GC Heap

Before My Changes

image

  • Callers to Char[]

image

After My Changes

image

  • Callers to Char[]

image

As we can see Char[] allocations went down 0.4% and in this particular scenario is around 7,500 less allocations when using ArrayPool. Also we can see on the callers that are creating this objects that if we use ArrayPool there is 0 Char[] allocations in Uri.UnescapeDataString

Now I'm keeping the old UriHelper.UnescapeString code and its dependencies around because in order to delete it there is a lot of work required because it is used a lot from internal APIs.

/cc @stephentoub @jkotas @danmosemsft

safern added 2 commits February 24, 2017 15:30
@safern safern added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) area-System.Net tenet-performance Performance related issue labels Feb 25, 2017
@safern safern added this to the Future milestone Feb 25, 2017
@safern safern self-assigned this Feb 25, 2017
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false);
return new string(dest, 0, position);
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length);
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, pooledArray, ref position,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks broken. PooledCharArray is a struct, so it'll be copied by value into UnescapeString. If UnescapeString then calls GrowAndCopy, it'll be replacing the array in the copy that was passed in, but not this original. Which means a) the resulting string that gets returned will be incorrect, and b) we'll be releasing the same array back into the ArrayPool twice.

This suggests to me we have a test hole in the tests that exercise this stuff. I suggest before pursuing this further, we ensure those holes are filled, at which point I expect this code will cause failures.

In any event, using a struct like this requires a lot of care, and typically the way it's handled is passing the struct around by ref rather than by value, though even then it requires a lot of scrutiny to ensure it's never being copied inappropriately.

Copy link
Member Author

@safern safern Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. I didn't think of the fact that GrowAndCopy would be replacing the copy and not the original. Thanks for the heads up. Will update to do it as by ref.

Opened an issue for the test hole: #16513

@stephentoub
Copy link
Member

Now I'm keeping the old UriHelper.UnescapeString code and its dependencies around because in order to delete it there is a lot of work required because it is used a lot from internal APIs

This makes it very hard to see in the diff what's changed in your implementation. It also results in a lot of duplication. Is it possible to consolidate them, so that, for example, the char[] based one is implemented on top of the PooledCharArray one as a thin wrapper?

@safern
Copy link
Member Author

safern commented Mar 1, 2017

Updated this to implement the char[] UnescapeString on top of the PooledArray one as a wrapper

@@ -499,29 +499,46 @@ internal unsafe bool InternalIsWellFormedOriginalString()
public static string UnescapeDataString(string stringToUnescape)
{
if ((object)stringToUnescape == null)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 12, length = 1)

fine either way to add braces but just FYI in the specific case of a single line block that is not nested in another block we allow omitting braces

https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my slight preference is to omit them here as it's easier to read but either is fine


In reply to: 103753759 [](ancestors = 103753759)

Copy link
Member Author

@safern safern Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was aware of single line blocks with no braces was allowed in our guidelines, I just added them because in other PRs I've done I have had the experience where the owners asked for braces on those blocks :( so I didn't know which one was the preferred style, but will get rid of them in this case :)

-- Thanks for the heads up by the way.

return new string(dest, 0, position);
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length);
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position,
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add overloads that don't take reserved chars so you don't have to pass in horrible dummy chars. But it was like that already so up to you

UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position,
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false);

if(pooledArray.IsSameString(pStr, 0, position))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( [](start = 22, length = 1)

nit:spacing

}
}

public unsafe bool IsSameString(char* ptrStr, int start, int end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsSameString [](start = 27, length = 12)

why not make this private, and instead expose something like

public string GetStringAndRelease(string candidate)

it can get the length off the string you pass in, so no need for a pointer/start/end/length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you can eliminate your Release() method.

it's better to force them to release at the same time they "get what they want" as that way it's hard to forget to Release()


In reply to: 103758643 [](ancestors = 103758643)

return true;
}

public void GrowAndCopy(int extraSpace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GrowAndCopy [](start = 17, length = 11)

We're trusting callers here to not hold on to the backing buffer through this call.
Might be better to call it something like "ReplaceArray" ? as a hint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about GrowAndReplaceArray? Is it too long? Just to let them know that it will create a new array the length of the old one + the extraSpace they need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.


In reply to: 103766045 [](ancestors = 103766045)

dest = UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, dest, ref position,
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false);
return new string(dest, 0, position);
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringToUnescape.Length [](start = 70, length = 23)

don't we know almost for certain this is too small? since you found at least one %. Should it be a little bigger (2? 5? 10?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually it is in most of the cases getting smaller because when we are unescaping strings we are transforming from escaped characters to regular characters and escaped characters are usually a 2 char or 3 char length string. i.e %20 -> ' '. The case where it has to grow is when the escaped string has reserved characters which is the rare case.

return new string(dest, 0, position);
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length);
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position,
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false); [](start = 78, length = 8)

it doesn't seem consistent in corefx code but in my previous teams we would always annotate bools in the caller since they're so opaque eg

null, /* isQuery= */ false)

up to you

sometimes we did it for null also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can use named arguments instead of comments, isQuery: false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right -- seems to be the pattern around here. Iff the parameters are at the end of the signature.


In reply to: 103774696 [](ancestors = 103774696)

{
pooledArray.Release();
return stringToUnescape;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted in PooledCharArray, I think we can cut this block and just make GetSTringAndRelease do al the work.

internal static unsafe char[] UnescapeString(char* pStr, int start, int end, char[] dest, ref int destPosition,
char rsvd1, char rsvd2, char rsvd3, UnescapeMode unescapeMode, UriParser syntax, bool isQuery)
{
PooledCharArray pooledArray = new PooledCharArray(dest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new PooledCharArray(dest); [](start = 41, length = 27)

this is a new allocation. but you don't need one, you have dest to write to. are you only doing this because UnescapeString expects a pooled array? If so we should maybe think about how to avoid that.

Copy link
Member Author

@safern safern Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Stephen said we would have duplicate code so it would be better to make this method a wrapper around the UnescapeString that needs the pooled array. Anyway PooledCharArray is a struct so it will not be allocated in the heap, and therefore we are avoiding char[] in this codepath as we are using the PooledCharArray and just copying it over to the dest char[] as a lot of APIs that fall in this codepath expect that behavior.


if (destPosition > dest.Length)
{
dest = new char[destPosition];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new char[destPosition]; [](start = 23, length = 23)

why isn't this getting it from the pool? any use of "new char[]" should be suspect in these codepaths now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I debugged the tests we currently have and there wasn't a case where it hitted this specific line of code. Eventhough this is just to make sure when copying it over to dest the new string fits in there, but still it wouldn't affect the passed dest because it is copied by value, but therefore we would return the new expanded array. What would you suggest?

return ret;
}

public void CopyAndRelease(char[] dest, int offset, int count)
Copy link
Member

@danmoseley danmoseley Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset [](start = 46, length = 6)

you always pass offset=zero so it might just make sense to have a single parameter (length)

@danmoseley
Copy link
Member

    private void MatchUTF8SequenceOverrunTest(byte[] inbytes, int numBytes, bool isQuery, bool iriParsing)

you verified your test changes here cause a failure with your original iteration?


Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:327 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False)

return true;
}

public void GrowAndCopy(int extraSpace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GrowAndCopy [](start = 17, length = 11)

Another approach might be to have a new constructor for PooledCharArray that took a PooledCharArray and extraSpace (or "size" to be consistent with the other ctor)

It would do all the same work.

@danmoseley
Copy link
Member

    internal static unsafe char[] EscapeString(string input, int start, int end, char[] dest, ref int destPos,

you didn't wave your magic wand over this one, and it does allocate char[]. Is that becuase it's not on the hot path? (which is a fair reason)


Refers to: src/System.Private.Uri/src/System/UriHelper.cs:128 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False)


if (destPosition > dest.Length)
{
dest = new char[destPosition];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

est = new char[destPosition]; [](start = 17, length = 29)

another allocation.

@danmoseley
Copy link
Member

namespace System.Net.Test.Uri.IriTest

Block coverage for UriHelper, IriHelper, is ~90% and Uri is 75% which isn't bad but branch coverage is more like 50%

Ideally you'd look at the branch coverage of the code around your changes by getting local code covearge data, then look for the little branch icons in the report showing where branches are missed

https://ci.dot.net/job/dotnet_corefx/job/master/job/code_coverage_windows/Code_Coverage_Report/

That would probably be expensive though.

In general the size/complexity/exposure of this change is a little worrying.


Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:10 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False)

@danmoseley
Copy link
Member

namespace System.Net.Test.Uri.IriTest

It might need scoping down a little. Interested in Stephen's tak.e


In reply to: 283458589 [](ancestors = 283458589)


Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:10 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False)

@danmoseley
Copy link
Member

namespace System.Net.Test.Uri.IriTest

As a general rule if you improved test coverage, you want to do it WITHOUT your implementation changes. Make sure they all pass. Then put your implementation in. That helps detect regressions


In reply to: 283458589 [](ancestors = 283458589)


Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:10 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False)

@danmoseley
Copy link
Member

@safern @stephentoub I suggest abandoning this PR as it's a fair way from here to mergeable (given complexity) and 2.0/UWP work needs to happen.

We can keep the issue open and reuse the learnings from this PR if and when we take another look, or someone else does. Thanks for the work on it @safern .

@danmoseley danmoseley closed this Mar 2, 2017
@safern
Copy link
Member Author

safern commented Mar 2, 2017

Sounds great! Thanks for all the advises and feedback from this PR it was a very good learning experience! @stephentoub @danmosemsft

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants