-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Improve Uri.UnescapeDataString codepath #16456
Conversation
Initial commit
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false); | ||
return new string(dest, 0, position); | ||
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length); | ||
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, pooledArray, ref position, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks broken. PooledCharArray is a struct, so it'll be copied by value into UnescapeString. If UnescapeString then calls GrowAndCopy, it'll be replacing the array in the copy that was passed in, but not this original. Which means a) the resulting string that gets returned will be incorrect, and b) we'll be releasing the same array back into the ArrayPool twice.
This suggests to me we have a test hole in the tests that exercise this stuff. I suggest before pursuing this further, we ensure those holes are filled, at which point I expect this code will cause failures.
In any event, using a struct like this requires a lot of care, and typically the way it's handled is passing the struct around by ref rather than by value, though even then it requires a lot of scrutiny to ensure it's never being copied inappropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. I didn't think of the fact that GrowAndCopy would be replacing the copy and not the original. Thanks for the heads up. Will update to do it as by ref.
Opened an issue for the test hole: #16513
This makes it very hard to see in the diff what's changed in your implementation. It also results in a lot of duplication. Is it possible to consolidate them, so that, for example, the char[] based one is implemented on top of the PooledCharArray one as a thin wrapper? |
Updated this to implement the char[] UnescapeString on top of the PooledArray one as a wrapper |
@@ -499,29 +499,46 @@ internal unsafe bool InternalIsWellFormedOriginalString() | |||
public static string UnescapeDataString(string stringToUnescape) | |||
{ | |||
if ((object)stringToUnescape == null) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ [](start = 12, length = 1)
fine either way to add braces but just FYI in the specific case of a single line block that is not nested in another block we allow omitting braces
https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my slight preference is to omit them here as it's easier to read but either is fine
In reply to: 103753759 [](ancestors = 103753759)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was aware of single line blocks with no braces was allowed in our guidelines, I just added them because in other PRs I've done I have had the experience where the owners asked for braces on those blocks :( so I didn't know which one was the preferred style, but will get rid of them in this case :)
-- Thanks for the heads up by the way.
return new string(dest, 0, position); | ||
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length); | ||
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position, | ||
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add overloads that don't take reserved chars so you don't have to pass in horrible dummy chars. But it was like that already so up to you
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position, | ||
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false); | ||
|
||
if(pooledArray.IsSameString(pStr, 0, position)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( [](start = 22, length = 1)
nit:spacing
} | ||
} | ||
|
||
public unsafe bool IsSameString(char* ptrStr, int start, int end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsSameString [](start = 27, length = 12)
why not make this private, and instead expose something like
public string GetStringAndRelease(string candidate)
it can get the length off the string you pass in, so no need for a pointer/start/end/length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, you can eliminate your Release() method.
it's better to force them to release at the same time they "get what they want" as that way it's hard to forget to Release()
In reply to: 103758643 [](ancestors = 103758643)
return true; | ||
} | ||
|
||
public void GrowAndCopy(int extraSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GrowAndCopy [](start = 17, length = 11)
We're trusting callers here to not hold on to the backing buffer through this call.
Might be better to call it something like "ReplaceArray" ? as a hint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about GrowAndReplaceArray? Is it too long? Just to let them know that it will create a new array the length of the old one + the extraSpace they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dest = UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, dest, ref position, | ||
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false); | ||
return new string(dest, 0, position); | ||
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringToUnescape.Length [](start = 70, length = 23)
don't we know almost for certain this is too small? since you found at least one %. Should it be a little bigger (2? 5? 10?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No actually it is in most of the cases getting smaller because when we are unescaping strings we are transforming from escaped characters to regular characters and escaped characters are usually a 2 char or 3 char length string. i.e %20 -> ' '. The case where it has to grow is when the escaped string has reserved characters which is the rare case.
return new string(dest, 0, position); | ||
PooledCharArray pooledArray = new PooledCharArray(stringToUnescape.Length); | ||
UriHelper.UnescapeString(stringToUnescape, 0, stringToUnescape.Length, ref pooledArray, ref position, | ||
c_DummyChar, c_DummyChar, c_DummyChar, unescapeMode, null, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false); [](start = 78, length = 8)
it doesn't seem consistent in corefx code but in my previous teams we would always annotate bools in the caller since they're so opaque eg
null, /* isQuery= */ false)
up to you
sometimes we did it for null also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can use named arguments instead of comments, isQuery: false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right -- seems to be the pattern around here. Iff the parameters are at the end of the signature.
In reply to: 103774696 [](ancestors = 103774696)
{ | ||
pooledArray.Release(); | ||
return stringToUnescape; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as noted in PooledCharArray, I think we can cut this block and just make GetSTringAndRelease do al the work.
internal static unsafe char[] UnescapeString(char* pStr, int start, int end, char[] dest, ref int destPosition, | ||
char rsvd1, char rsvd2, char rsvd3, UnescapeMode unescapeMode, UriParser syntax, bool isQuery) | ||
{ | ||
PooledCharArray pooledArray = new PooledCharArray(dest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new PooledCharArray(dest); [](start = 41, length = 27)
this is a new allocation. but you don't need one, you have dest to write to. are you only doing this because UnescapeString expects a pooled array? If so we should maybe think about how to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well Stephen said we would have duplicate code so it would be better to make this method a wrapper around the UnescapeString
that needs the pooled array. Anyway PooledCharArray is a struct so it will not be allocated in the heap, and therefore we are avoiding char[]
in this codepath as we are using the PooledCharArray and just copying it over to the dest char[]
as a lot of APIs that fall in this codepath expect that behavior.
|
||
if (destPosition > dest.Length) | ||
{ | ||
dest = new char[destPosition]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new char[destPosition]; [](start = 23, length = 23)
why isn't this getting it from the pool? any use of "new char[]" should be suspect in these codepaths now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I debugged the tests we currently have and there wasn't a case where it hitted this specific line of code. Eventhough this is just to make sure when copying it over to dest the new string fits in there, but still it wouldn't affect the passed dest because it is copied by value, but therefore we would return the new expanded array. What would you suggest?
return ret; | ||
} | ||
|
||
public void CopyAndRelease(char[] dest, int offset, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset [](start = 46, length = 6)
you always pass offset=zero so it might just make sense to have a single parameter (length)
you verified your test changes here cause a failure with your original iteration? Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:327 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False) |
return true; | ||
} | ||
|
||
public void GrowAndCopy(int extraSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GrowAndCopy [](start = 17, length = 11)
Another approach might be to have a new constructor for PooledCharArray that took a PooledCharArray and extraSpace (or "size" to be consistent with the other ctor)
It would do all the same work.
you didn't wave your magic wand over this one, and it does allocate char[]. Is that becuase it's not on the hot path? (which is a fair reason) Refers to: src/System.Private.Uri/src/System/UriHelper.cs:128 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False) |
|
||
if (destPosition > dest.Length) | ||
{ | ||
dest = new char[destPosition]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
est = new char[destPosition]; [](start = 17, length = 29)
another allocation.
Block coverage for UriHelper, IriHelper, is ~90% and Uri is 75% which isn't bad but branch coverage is more like 50% Ideally you'd look at the branch coverage of the code around your changes by getting local code covearge data, then look for the little branch icons in the report showing where branches are missed https://ci.dot.net/job/dotnet_corefx/job/master/job/code_coverage_windows/Code_Coverage_Report/ That would probably be expensive though. In general the size/complexity/exposure of this change is a little worrying. Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:10 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False) |
As a general rule if you improved test coverage, you want to do it WITHOUT your implementation changes. Make sure they all pass. Then put your implementation in. That helps detect regressions In reply to: 283458589 [](ancestors = 283458589) Refers to: src/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs:10 in 6fa9a62. [](commit_id = 6fa9a62, deletion_comment = False) |
@safern @stephentoub I suggest abandoning this PR as it's a fair way from here to mergeable (given complexity) and 2.0/UWP work needs to happen. We can keep the issue open and reuse the learnings from this PR if and when we take another look, or someone else does. Thanks for the work on it @safern . |
Sounds great! Thanks for all the advises and feedback from this PR it was a very good learning experience! @stephentoub @danmosemsft |
Work for #16040
This improvement is an initial proposal to avoid char[] allocations done in Uri.UnescapeDataString codepath specifically under UriHelper.UnescapeString. This as well reflects to be faster in a local mini-test I wrote source code here -- and also saw some improvements in inclusive and exclusive time in an asp.net test scenario which you can find here.
The main improvements is that now I'm not seeing any char[] allocations in the GC Heap in any of the scenarios coming from the Uri.UnescapeDataString codepath and a little improvement in execution time.
Here is the mini-test I wrote PerfView traces result:
CPU Stacks
In this test scenario we can see that there is a bigger difference inclusive and exclusive time spent in
Uri.UnescapeDataString
GC Heap
Char[]
Char[]
As we can see in this scenarios
Uri.UnescapeDataString
is allocating 0Char[]
and it reduced the allocations by 50% from 34,000 to 17,000.Here are the results for the asp.net test scenario:
CPU Stacks
As we can see after the changes
Uri.UnescapeDataString
went down on both exclusive (0.4%) and inclusive (0.1%) time.GC Heap
Char[]
Char[]
As we can see
Char[]
allocations went down 0.4% and in this particular scenario is around 7,500 less allocations when usingArrayPool
. Also we can see on the callers that are creating this objects that if we useArrayPool
there is 0Char[]
allocations inUri.UnescapeDataString
Now I'm keeping the old
UriHelper.UnescapeString
code and its dependencies around because in order to delete it there is a lot of work required because it is used a lot from internal APIs./cc @stephentoub @jkotas @danmosemsft