Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IndexOfAnyValues in CoreLib #78678

Merged
merged 2 commits into from
Nov 22, 2022

Conversation

MihaZupan
Copy link
Member

Contributes to #78204

Avoids the cost of initializing the ProbabilisticMap on every matched value for ReplaceLineEndings.

Method Toolchain Input Mean Error Ratio
ReplaceLineEndings main \n\n(...)\n\n [1000] 26,379.96 ns 142.148 ns 1.00
ReplaceLineEndings pr \n\n(...)\n\n [1000] 13,946.75 ns 49.276 ns 0.53
ReplaceLineEndings main aaaaa(...)aaaaa [32] 46.70 ns 0.179 ns 1.00
ReplaceLineEndings pr aaaaa(...)aaaaa [32] 33.29 ns 0.356 ns 0.71

In this case we're using "\r\n\f\u0085\u2028\u2029" as the needle.
We also have a bunch of uses of char.IsWhiteSpace where the needle likewise contains both ASCII and non-ASCII chars.

As ASCII values are more common in these cases, we could consider adding IndexOfAnyValues implementations with loops like this in the future:

while (remaining > vector256)
{
    var vector = Vector256.LoadUnsafe(...);
    if (AllCharsAreAscii(vector))
    {
        IndexOfAnyAsciiSearcher.Lookup(vector, _bitmap);
    }
    else
    {
        // Fallback to ProbabilisticMap or even IndexOfAny(char, char, char)
    }
}

@MihaZupan MihaZupan added this to the 8.0.0 milestone Nov 22, 2022
@ghost ghost assigned MihaZupan Nov 22, 2022
@ghost
Copy link

ghost commented Nov 22, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #78204

Avoids the cost of initializing the ProbabilisticMap on every matched value for ReplaceLineEndings.

Method Toolchain Input Mean Error Ratio
ReplaceLineEndings main \n\n(...)\n\n [1000] 26,379.96 ns 142.148 ns 1.00
ReplaceLineEndings pr \n\n(...)\n\n [1000] 13,946.75 ns 49.276 ns 0.53
ReplaceLineEndings main aaaaa(...)aaaaa [32] 46.70 ns 0.179 ns 1.00
ReplaceLineEndings pr aaaaa(...)aaaaa [32] 33.29 ns 0.356 ns 0.71

In this case we're using "\r\n\f\u0085\u2028\u2029" as the needle.
We also have a bunch of uses of char.IsWhiteSpace where the needle likewise contains both ASCII and non-ASCII chars.

As ASCII values are more common in these cases, we could consider adding IndexOfAnyValues implementations with loops like this in the future:

while (remaining > vector256)
{
    var vector = Vector256.LoadUnsafe(...);
    if (AllCharsAreAscii(vector))
    {
        IndexOfAnyAsciiSearcher.Lookup(vector, _bitmap);
    }
    else
    {
        // Fallback to ProbabilisticMap or even IndexOfAny(char, char, char)
    }
}
Author: MihaZupan
Assignees: -
Labels:

area-System.Runtime

Milestone: 8.0.0

@MihaZupan
Copy link
Member Author

The checked arm64 failures are #78718

@MihaZupan MihaZupan merged commit 47158cc into dotnet:main Nov 22, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants