-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API IndexNotOf #28795
Comments
As an alternative name - |
Or |
Are these more commonly needed on Span than on String? Since we've got by without them on String since day 1 (which is not an argument against, but it's a datapoint) |
var firstNonSpace = text.AsSpan().IndexNotOf(' ');
if (firstNonSpace > 0)
{
return text.Substring(firstNonSpace, text.Length - firstNonSpace);
}
return text; 😉 string would also presumably require culture and ignorecase variants? Thought I'd start simple :) |
What about Find which takes a predicate? That should cover this scenario? |
Can't vectorize that for primitive types |
I feel like we should instead try to look into making some of the delegate scenarios vectorizable but I don't feel super strongly on not having this API (just feel like we will start adding up other APIs like: IndexOfValueInRange IndexOfValueNotInRange soon and that can quickly pollute the framework) |
If we add this, we can take advantage of it from our Regex implementation. It now uses IndexOf and could easily use IndexNotOf should it be available. For example, building on @benaadams' example from earlier in the issue, if you had a regex "eat *spaces" that would find "eat" followed by any number of spaces followed "spaces", that would end up being represented in our IR as a multi ("eat") followed by a one loop (" *") followed by another multi ("spaces"), and we'd use the IndexNotOf to implement the one loop. |
For string would could re-implement the following apis with these span versions partial class String
{
string Trim();
string Trim(char trimChar);
string Trim(params char[] trimChars);
string TrimEnd();
string TrimEnd(char trimChar);
string TrimEnd(params char[] trimChars);
string TrimStart();
string TrimStart(char trimChar);
string TrimStart(params char[] trimChars);
} |
We could, but I'm skeptical we'd want to... in my experience the number of spaces being trimmed is usually few to none, which could make the overhead here be very pronounced. Also, the trim methods that don't take characters use IsWhitespace and would need to compare against every Unicode space character. |
Would we want all the variety of overloads we have for IndexOf, etc? That would give us public static int IndexNotOf<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
public static int IndexNotOf<T>(this Span<T> span, T value) where T : IEquatable<T>;
public static int IndexNotOf<T>(this Span<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>;
public static int IndexNotOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>;
public static int IndexNotOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static int IndexNotOfAny<T>(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
public static int IndexNotOfAny<T>(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
public static int IndexNotOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
public static int IndexNotOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
public static int IndexNotOfAny<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
public static int IndexNotOfAny<T>(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
public static int LastIndexNotOf<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
public static int LastIndexNotOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static int LastIndexNotOf<T>(this Span<T> span, T value) where T : IEquatable<T>;
public static int LastIndexNotOf<T>(this Span<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>;
public static int LastIndexNotOf<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
public static int LastIndexNotOfAny<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>; |
I'm implementing JsonPatch using System.Text.Json (the existing ASP.Net Core JsonPatch support is based on Newtonsoft) and this would be incredibly useful to parse array indexes as part of implementing JSON Pointer. The JSON Pointer RFC defines a valid array index as:
So ideally, I'd love to use this as my validation for that: var isValidArrayIndex =
span.SequenceEqual(Zero) ||
(span.IndexOfAny(OneThroughNine) == 0 &&
span.IndexNotOfAny(Digits) == -1); |
Thanks for the info. Perhaps someone wants to share thoughts about whether it should be the full set of overloads or not. Then perhaps this is ready for API review. |
Don't think the
|
Are there good scenarios for In the top proposal ther'es a mixture of |
Don't have any strong feelings for |
|
Another example where it could be used runtime/src/libraries/System.Private.CoreLib/src/System/IO/PathInternal.Windows.cs Lines 406 to 411 in effaf28
return path.IndexNotOf(' ') < 0 |
@benaadams for single char/element scenario perhaps might be better to try to optimize Linq to do that fast (perhaps detect patterns similar to |
It's a not pattern so would be path.All(c => c == ' ') However having Linq autovectorize from a public static bool All<char>(this IEnumerable<char> source, Func<char, bool> predicate) Could change the predicate to be
Some kind of Linq-style autovectorization for Span would be interesting via C# source generators where it converts the expressions to platform intrinsics e.g. public static bool All<TSource>(this ReadOnlySpan<TSource> source, Expression<Func<TSource, bool>> predicate)
where TSource : unmanaged but it's probably also a multi year research project? |
Yeah, I was confused because you originally had ROS for LastIndexOfAny, and not IndexOfAny. 😄 It looks good to me -- @GrabYourPitchforks ? |
Ben's original proposal (plus their Last* counterparts) looks ready to go forward to review. We shouldn't provide Steve already provided reasons for not rewriting |
Added the Last ones back |
@stephentoub when you mentioned taking advantage in Regex, were you hoping they'd be faster than the naive loop it currently does? Unfortunately the number of characters that the regex might scan could be anything from one to megabytes. For regex, a vectorized approach might be a better tradeoff across all patterns/texts. |
FWIW the implementation nit I mentioned above doesn't have to be resolved before API review. We can always create the appropriate implementation as we become aware of various scenarios. The only real requirement from an SDL perspective is that given this: int idx = source.IndexNotOfAny(values); The runtime complexity of this method must be bound by |
There are two benefits for regex:
Just the first is a reasonable place to start. If that can be done, it's worth using in regex. Then if the implementation can be made even faster, yay, regex just gets better implicitly. I do agree with Levi, though; I think it's likely that even in the regex scenarios we'd find in common cases the loops to have very short iteration counts. But it'd be worth looking at what the opportunities are for using the method and then looking at common patterns and inputs to see what we think the win would actually be. |
Implementation details... ;) Sounds like is ready to mark for review? |
Marked. :) |
namespace System
{
public static partial class MemoryExtensions
{
int IndexNotOf(this Span<T> span, T value) where T : IEquatable<T>;
int IndexNotOfAny(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
int IndexNotOfAny(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int IndexNotOfAny(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int IndexNotOf(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
int IndexNotOfAny(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
int IndexNotOfAny(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int IndexNotOfAny(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int LastIndexNotOf(this Span<T> span, T value) where T : IEquatable<T>;
int LastIndexNotOfAny(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
int LastIndexNotOfAny(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int LastIndexNotOfAny(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int LastIndexNotOf(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
int LastIndexNotOfAny(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
int LastIndexNotOfAny(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int LastIndexNotOfAny(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
}
} |
I think we should reconsider this one. There are a variety of places it's useful, e.g.
|
could be just |
I just had another case, where I wanted to determine whether some memory was all zero'd. We don't have a great helper for that built-in today, but |
In the original proposal, there was 16 methods being proposed. @stephentoub from what you have wrote it seems that a single one would be enough for our current needs. Is that correct? If so, would you like to update the proposal and present it in the API review? |
The methods are:
We have a general policy with MemoryExtensions that we add both the Span and ReadOnlySpan variants because extensions on ReadOnlySpan don't show up for Span implicitly. The cited use cases cover both IndexOf and LastIndexOf; it's just a matter of which way you need to search in the input. So at a minimum, we'd need 4 of the 16. But without the "any" overloads, you don't have any support for more than one value, and those scenarios certainly exist. For example, there are many regex patterns that begin with something like [^ab] to start matching something that begins with anything other than a or b, and for that we'd use an IndexNotOf that took more than one value. So I think we'd actually want at least the 8 of the 16. The other 8 (all of the 2/3 arg overloads).are an optimization we could do without if we were concerned about number of overloads. Leaving them out would however also be inconsistent with IndexOf, which has those overloads, and would have at least a little more overhead... most likely we'd still end up having them in the implementation, just as internal, with the any overloads special-casing those lengths and delegating. |
namespace System;
public static partial class MemoryExtensions
{
int IndexOfAnyExcept(this Span<T> span, T value) where T : IEquatable<T>;
int IndexOfAnyExcept(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
int IndexOfAnyExcept(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int IndexOfAnyExcept(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int IndexOfAnyExcept(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
int IndexOfAnyExcept(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
int IndexOfAnyExcept(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int IndexOfAnyExcept(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this Span<T> span, T value) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this Span<T> span, T value0, T value1) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this Span<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this Span<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this ReadOnlySpan<T> span, T value0, T value1) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this ReadOnlySpan<T> span, T value0, T value1, T value2) where T : IEquatable<T>;
int LastIndexOfAnyExcept(this ReadOnlySpan<T> span, ReadOnlySpan<T> values) where T : IEquatable<T>;
} |
Example usage:
The text was updated successfully, but these errors were encountered: