Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: {Last}IndexOf{AnyExcept}Range #76106

Closed
stephentoub opened this issue Sep 23, 2022 · 4 comments · Fixed by #76803
Closed

[API Proposal]: {Last}IndexOf{AnyExcept}Range #76106

stephentoub opened this issue Sep 23, 2022 · 4 comments · Fixed by #76803
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Sep 23, 2022

Background and motivation

In looking over various code bases, it's relatively common to want to search for things in or out of a particular range.

For example, check whether the input contains anything outside of the range [0x20, 0x7e]:

for (int i = 0; i < token.Length; ++i)
{
if ((token[i] < 0x20) || (token[i] > 0x7e))
{
return true;
}
}
return false;

Or find the next index that's not an ASCII digit:

for (; i < end; i++)
{
if (!char.IsAsciiDigit(name[i]))
{
return false;
}
}

Or find the index of the first ASCII digit:

// Move to the beginning of the number
for (; (uint)pos < (uint)text.Length; pos++)
{
if (char.IsAsciiDigit(text[pos]))
{
break;
}
}

Or determine whether the input contains anything outside of the range of a byte:

for (int i = 0; i < input.Length; i++)
{
if (input[i] > (char)255)
{
return input; // This couldn't have come from the wire, someone assigned it directly.
}
else if (input[i] > (char)127)
{
possibleUtf8 = true;
break;
}
}

Or find the index of the first upper-case ASCII letter:

int i = 0;
while (i < s.Length)
{
if (char.IsAsciiLetterUpper(pSource[i]))
{
break;
}
i++;
}

Or find the first index of a value that's at least some integer:

for (solarMonth = 1; solarMonth < 12; solarMonth++)
{
if (days[solarMonth] >= solarDay)
{
break;
}
}

Etc. Today we don't provide an IndexOfXx method that makes it easy to search for such ranges. And even if/when we provide a more general and vectorized IndexOfAny that supports more than 5 characters (#68328), it's likely that such a method will be both harder to use (e.g. an initialization method and the actual method) and less efficient (because it's more complicated) than if we just have an IndexOfRange that lets you search for a simple range of values.

API Proposal

namespace System

public static class MemoryExtensions
{
+    public static int IndexOfRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
+    public static int IndexOfRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

+    public static int IndexOfAnyExceptRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
+    public static int IndexOfAnyExceptRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

+    public static int LastIndexOfRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
+    public static int LastIndexOfRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

+    public static int LastIndexOfAnyExceptRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
+    public static int LastIndexOfAnyExceptRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
}
  • There are 8 methods/overloads here for: Span vs ReadOnlySpan, inclusive vs exclusive, first vs last.
  • We can vectorize these, but only for built-in types where we know the semantics of the vector comparisons matches the semantics of the INumber<T> implementations. That's the same set of types as what we already vectorize in other operations, but as we're able to open up that set further (e.g. [API Proposal]: IBitwiseEquatable<T> #75642), it likely won't extend to these.
  • I've marked the constraint as being non-nullable; I don't know what it would mean to have a null low/high value. But if we think there's a good reason to allow it, we could make the constraints be ?. My intent is the implementation will validate that the range values are non-null.
  • With char.IsBetween, we decided to consider it undefined behavior if high < low, and we don't validate high >= low. My intent here is to do the same.

API Usage

internal static bool IsOnlyDigits(string str, int offset, int count) =>
    str.AsSpan(offset, count).IndexOfAnyExceptRange('0', '9') < 0;

Alternative Designs

No response

Risks

No response

@stephentoub stephentoub added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory labels Sep 23, 2022
@stephentoub stephentoub added this to the 8.0.0 milestone Sep 23, 2022
@stephentoub stephentoub self-assigned this Sep 23, 2022
@ghost
Copy link

ghost commented Sep 23, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

In looking over various code bases, it's relatively common to want to search for things in or out of a particular range.

For example, check whether the input contains anything outside of the range [0x20, 0x7e]:

for (int i = 0; i < token.Length; ++i)
{
if ((token[i] < 0x20) || (token[i] > 0x7e))
{
return true;
}
}
return false;

Or find the next index that's not an ASCII digit:

for (; i < end; i++)
{
if (!char.IsAsciiDigit(name[i]))
{
return false;
}
}

Or find the index of the first ASCII digit:

// Move to the beginning of the number
for (; (uint)pos < (uint)text.Length; pos++)
{
if (char.IsAsciiDigit(text[pos]))
{
break;
}
}

Or determine whether the input contains anything outside of the range of a byte:

for (int i = 0; i < input.Length; i++)
{
if (input[i] > (char)255)
{
return input; // This couldn't have come from the wire, someone assigned it directly.
}
else if (input[i] > (char)127)
{
possibleUtf8 = true;
break;
}
}

Or find the index of the first upper-case ASCII letter:

int i = 0;
while (i < s.Length)
{
if (char.IsAsciiLetterUpper(pSource[i]))
{
break;
}
i++;
}

Or find the first index of a value that's at least some integer:

for (solarMonth = 1; solarMonth < 12; solarMonth++)
{
if (days[solarMonth] >= solarDay)
{
break;
}
}

Etc. Today we don't provide an IndexOfXx method that makes it easy to search for such ranges. And even if/when we provide a more general and vectorized IndexOfAny that supports more than 5 characters (#68328), it's likely that such a method will be both harder to use (e.g. an initialization method and the actual method) and less efficient (because it's more complicated) than if we just have an IndexOfRange that lets you search for a simple range of values.

API Proposal

namespace System

public static class MemoryExtensions
{
        public static int IndexOfRange<T>(thisReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;
        public static int IndexOfRange<T>(thisSpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;

        public static int IndexOfAnyExceptRange<T>(thisReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;
        public static int IndexOfAnyExceptRange<T>(thisSpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;

        public static int LastIndexOfRange<T>(thisReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;
        public static int LastIndexOfRange<T>(thisSpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;

        public static int LastIndexOfAnyExceptRange<T>(thisReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;
        public static int LastIndexOfAnyExceptRange<T>(thisSpan<T> span, T lowInclusive, T highInclusive) where T : INumber<T>;
}
  • There are 8 methods/overloads here for: Span vs ReadOnlySpan, inclusive vs exclusive, first vs last.

API Usage

internal static bool IsOnlyDigits(string str, int offset, int count) =>
    str.AsSpan(offset, count).IndexOfAnyExceptRange('0', '9') < 0;

Alternative Designs

No response

Risks

No response

Author: stephentoub
Assignees: stephentoub
Labels:

api-suggestion, area-System.Memory

Milestone: 8.0.0

@danmoseley
Copy link
Member

danmoseley commented Sep 24, 2022

Based on past experience of vectorizing similar methods, there'll potentially be a penalty for the shortest inputs. Is it possible that some of the above examples would be faster remaining in their current form? (Eg the IP address example)

@stephentoub
Copy link
Member Author

stephentoub commented Sep 24, 2022

Is it possible that some of the above examples would be faster remaining in their current form? (Eg the IP address example)

Possible, sure. There may be an extra few nanoseconds if the input is really small, basically the overhead of a method call and a length check.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Sep 27, 2022
@terrajobst
Copy link
Member

terrajobst commented Oct 4, 2022

Video

  • We should rename the methods to say InRange and include Any for consistency.
namespace System;

public partial class MemoryExtensions
{
    public static int IndexOfAnyInRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
    public static int IndexOfAnyInRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

    public static int IndexOfAnyExceptInRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
    public static int IndexOfAnyExceptInRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

    public static int LastIndexOfAnyInRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
    public static int LastIndexOfAnyInRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;

    public static int LastIndexOfAnyExceptInRange<T>(this ReadOnlySpan<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
    public static int LastIndexOfAnyExceptInRange<T>(this Span<T> span, T lowInclusive, T highInclusive) where T : IComparable<T>;
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 4, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 10, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 26, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants