-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add span-based overloads of String.GetHashCode #26924
Comments
Briefly what are the reasons for this? I was working my way towards a proposal on this topic and that was the obvious path. Will the hashcodes for strings and equivalent spans be required to be the same? Having read through various discussions about the MArvin string hashing implementations in coreclr and corefx I seem to remember that the length and zero termination could be involved in the hash. You also mention specialized data structures in asp.net, got a link to those? When considering this the main problem I found was that there was no resilience to hash collision if you use the hash as the key. That's not likely to be a problem for small lookups but in a larger case could cause misses. |
Does this mean it would make sense to open a separate API proposal to make the methods non-virtual? This would make the API clearer, consistent with this new method and slightly more efficient. Or would that still be considered a breaking change? |
@rynowak Can you comment on the specific use cases ASP.NET has? You pointed me earlier to this as an experimental sample, but I don't want to put words in your mouth regarding your particular motivations or use cases. @Wraith2 The equality operator on Given @svick Per the comment at https://github.com/dotnet/corefx/issues/28001#issuecomment-405683985, the current plan of record is simply to make newly-added APIs on the |
Sure, our primary use case for this right now is inside routing. We are executing a state machine and need to make some kind of branching decision based on a section of the URL path. We'd really prefer to do this without the need to substring since most of the time we won't use that content as a string anyway. So we're likely to end up owning our own hash table implementation that stores a set of strings as keys, but can use a The hash code part of this is something we don't want to duplicate from CoreCLR because it's pretty general-purpose. |
Ok, I agree with all of that reasoning. The specific case I was considering was |
Putting the APIs on |
Another helper method that would help MSBuild and also help StringSegment avoid allocations in its HashCode would be: public class String {
// new static APIs on existing string type
public static int GetHashCode(string aString, int offset, int length, StringComparison comparisonType);
}
public class StringComparer {
public bool Equals(string a, int aOffset, int aLength, string b, int bOffset, int bLength);
public int GetHashCode(string aString, int offset, int length);
} MSBuild has such an attempt, though it's hardcoded to OrdinalIgnoreCase: https://github.com/Microsoft/msbuild/blob/master/src/Shared/MSBuildNameIgnoreCaseComparer.cs#L119 |
You can use the proposed span-based overloads instead (and call AsSpan on the string). Edit: As we discussed offline, there might be a perf penalty if the user has to call AsSpan everytime the GetHashCode API gets called. We should measure if the string null check would have an impact. |
@ahsonkhan If the JIT can elide the null check (perhaps because the string instance has already been dereferenced within the same frame), there won't be a perf penalty. For example, I've used the following pattern to elide these checks. int dummy = theString.Length;
ReadOnlySpan<char> theSpan = theString.AsSpan();
// consume theSpan The JIT is smart enough to know that control can't proceed past the ; assume rax references the string instance
mov r10d, dword ptr [rax + <const_offset_to_length>] ; access Length property, raises exception if null
lea r9d, [rax + <const_offset_to_data>] ; this is the 'ref readonly char' field backing the span
mov r10d, dword ptr [rax + <const_offset_to_length>] ; this is the length field backing the span (redundant, but whatever) It ends up being fairly efficient in the end. |
FYI this work is already done and sitting in the feature/utf8string branch. It'll get merged in at some point in the future. |
@GrabYourPitchforks, could you create a PR for moving just these APIs over to master? They seem very separable from the Utf8String work, and if they're already implemented, let's move 'em along and close out this issue :) |
@stephentoub Sure, let me ping some people in-person and see what the best path forward would be. The implementation currently depends on some other refactorings to the |
Thanks. |
dotnet/coreclr#20275 is the first PR under the "depends on some other refactorings" umbrella. The helper methods introduced there and in one more subsequent PR will be used by the |
ASP.NET has specialized data structures that use span equivalents as keys in dictionaries. As such, they need an equivalent of
String.GetHashCode
which takesReadOnlySpan<char>
instead ofstring
. I propose the following APIs to enable this scenario.The new static APIs on
string
mimic the behaviors of the instance methodsString.GetHashCode()
andString.GetHashCode(StringComparison)
, respectively. Per previous email discussions we do not want to specializeReadOnlySpan<T>.GetHashCode
whenT = char
, and we cannot add this as an extension method, so the next best logical place would be to put this on thestring
type directly.The new instance API on
CompareInfo
mimics the behavior of the existing instance methodCompareInfo.GetHashCode(String, CompareOptions)
. Even though all the other methods onCompareInfo
are virtual, per last week's API review we determined that there's no legal way for anybody to subclass this type, so new APIs on this type needn't be virtual./cc @rynowak
The text was updated successfully, but these errors were encountered: