Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: MemoryExtensions.Count / number of occurrences of a value in a span #59466

Closed
Tracked by #79053
bollhals opened this issue Sep 22, 2021 · 25 comments · Fixed by #80662
Closed
Tracked by #79053

[API Proposal]: MemoryExtensions.Count / number of occurrences of a value in a span #59466

bollhals opened this issue Sep 22, 2021 · 25 comments · Fixed by #80662
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@bollhals
Copy link
Contributor

bollhals commented Sep 22, 2021

Background and motivation

For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string. So in order to use string.Create I had to know the final size upfront, for this I had to count the number of a specific char I wanted to replace (e.g replacing '[' with '<' & ' ' (2 chars), so the final size was old size + number of chars to replace found)

To my surprise there is currently no string.Count method that is vectorized or I've missed it...
A simple for loop would've worked as well to count, but as seen here, it is slower than a vectorized count. (The link also shows an example implementation of a vectorized count and its improvements

API Proposal

namespace System
{
    public sealed partial class MemoryExtensions
    {
        public static int Count<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>?;

        public static int Count<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;

        public static int Count<T>(this Span<T> span, T value) where T : IEquatable<T>?;

        public static int Count<T>(this Span<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;
    }
}

API Usage

// Instead of this
int count = 0;
int tmp = 0;
while ((tmp = charSpan.IndexOf('[', tmp)) >= 0)
{
    count++;
    tmp++;
}

// Or this
int count = 0;
for(int i =0; i < charSpan.Length; i++)
{
    if (charSpan[i] == '[')
    {
      count++;
    }
}


// We could do this in a more optimized and simplified version
int count = charSpan.Count('[');

Risks

None that I can think of.

@bollhals bollhals added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 22, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 22, 2021
@Sergio0694
Copy link
Contributor

Sergio0694 commented Sep 22, 2021

Ok wow, you linked my blog post there and you just made my day 😄

If it helps, we have a vectorized Count method like the one you're describing in the Microsoft.Toolkit.HighPerformance package (see here). It is also available as a generic extension for span types, with vectorization support where available.

cc. @tannergooding as mentioned before, I'd be happy to coordinate with the .NET libraries team if we wanted to port (with changes if needed) the Count method from the HighPerformance package to the BCL in the future 🙂
For reference, the current implementation (pending some planned .NET 6 improvements) is here.

@stephentoub
Copy link
Member

If we want to add such methods, I'd prefer that:

  • They're added to MemoryExtensions for {ReadOnly}Span so that they apply to more than just String. This also is helpful in that Count is most logically associated with IndexOf, which on string is culture-aware by default, and I expect we want these to be ordinal by default, which is what we do for spans
  • Overloads for looking for more than one character are also included; otherwise the same loop around IndexOf will be needed for such cases, and for culture-aware, that's trickier than expected to do correctly.
  • We consider adding Count methods as well to Regex (this can be considered separately; I've been meaning to open an issue for it)

For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string

For your use case, why was Replace insufficient?

@bollhals
Copy link
Contributor Author

bollhals commented Sep 22, 2021

If we want to add such methods, I'd prefer that:

  • They're added to MemoryExtensions for {ReadOnly}Span so that they apply to more than just String. This also is helpful in that Count is most logically associated with IndexOf, which on string is culture-aware by default, and I expect we want these to be ordinal by default, which is what we do for spans

Completely fine by me. (I've started with String because I thought it might be simpler to do than having it generic in MemoryExtensions)
Would you still add instance methods to String like there already are for e.g. IndexOf? Or shall it be restricted to Span?

  • Overloads for looking for more than one character are also included; otherwise the same loop around IndexOf will be needed for such cases, and for culture-aware, that's trickier than expected to do correctly.

Also fine by me, will definitely look more complete & useful if these are added as well.

For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string

For your use case, why was Replace insufficient?

I had to replace multiple characters with different replacements (different sizes e.g. some are 1:1 replace, others 1:2), so instead of creating multiple intermediate strings, I opted for calculating the target size first and then copy / replace it in the buffer of string.create.
(Seemed to be the fastest option I could come up with)

@stephentoub
Copy link
Member

Would you still add instance methods to String like there already are for e.g. IndexOf? Or shall it be restricted to Span?

My preference is just for span on MemoryExtensions. I don't think this is important enough to also add to String, and having it on String also means more overloads (with spans, you don't need overloads that take offset/count).

For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string

For your use case, why was Replace insufficient?

I had to replace multiple characters with different replacements

Makes sense. This is relatively common. It might be worth thinking through what an API for that on String would look like (which could use MemoryExtensions.Count as an implementation detail if desired). Even in dotnet/runtime that's done in a variety of places, e.g.

baseName = baseName.Replace(@"\\", @"\").Replace(@"\[", "[").Replace(@"\*", "*").Replace(@"\&", "&");

ArraySuffix = typename.Substring(pos, LastChar - pos + 1).Replace("["c, "("c).Replace("]"c, ")"c)

Extensions.Replace(" ", string.Empty).Replace(".", string.Empty).ToLowerInvariant();

result = result.Replace('(', '[').Replace(')', ']').Replace('#', '_').Replace('/', '_').Replace('\\', '_');

string symbolsName = providerName.Replace("-", "").Replace('.', '_'); // Period and - are illegal replace them.

and more generally:
https://grep.app/search?q=%5C.Replace%5C%28.%2A%5C.Replace%5C%28&regexp=true&filter[lang][0]=C%23

@ghost
Copy link

ghost commented Sep 22, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

For a performance sensitive code I had to compute a string, for which I had to replace certain characters in a input string. So in order to use string.Create I had to know the final size upfront, for this I had to count the number of a specific char I wanted to replace (e.g replacing '[' with '<' & ' ' (2 chars), so the final size was old size + number of chars to replace found)

To my surprise there is currently no string.Count method that is vectorized or I've missed it...
A simple for loop would've worked as well to count, but as seen here, it is slower than a vectorized count. (The link also shows an example implementation of a vectorized count and its improvements

API Proposal

namespace System
{
    public sealed partial class String
    {
        public int Count(char value);

        public int Count(char value, int startIndex);

        public int Count(char value, int startIndex, int count);
    }
}

Additional Note:

  • Are additional overloads useful for StringComparison?
  • Are additional methods useful like CountAny(char, char)?

API Usage

// Instead of this
int count = 0;
int tmp = 0;
while ((tmp = stringValue.IndexOf('[', tmp)) >= 0)
{
    count++;
    tmp++;
}

// Or this
int count = 0;
for(int i =0; i < stringValue.Length; i++)
{
    if (stringValue[i] == '[')
    {
      count++;
    }
}


// We could do this in a more optimized and simplified version
int count = stringValue.Count('[');

Risks

None that I can think of.

Author: bollhals
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@bollhals
Copy link
Contributor Author

Would you still add instance methods to String like there already are for e.g. IndexOf? Or shall it be restricted to Span?

My preference is just for span on MemoryExtensions. I don't think this is important enough to also add to String, and having it on String also means more overloads (with spans, you don't need overloads that take offset/count).

I've updated the initial post to a proposal for extending MemoryExtensions.

Makes sense. This is relatively common. It might be worth thinking through what an API for that on String would look like (which could use MemoryExtensions.Count as an implementation detail if desired). Even in dotnet/runtime that's done in a variety of places, e.g.

That certainly would be a handy API to have, but maybe a separate issue for it is better suited to discuss its detail. Said API could then (as you said) use this API once they're actually implemented.

@bollhals bollhals changed the title [API Proposal]: string.Count / number of occurrences of a char in a string [API Proposal]: MemoryExtensions.Count / number of occurrences of a value in a span Sep 22, 2021
@jeffhandley jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 30, 2021
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Sep 30, 2021
@stephentoub
Copy link
Member

This is another interesting example of the issue described in #75175, where in the past we may have waved something away as being trivial to implement as a simple loop, but with the performance of vectorization on the table and the lack of an auto-vectorization story, developers would end up needing to write a lot more code to approximate the performance of what we'd presumably include in the box.

Here's are three implementations of a Count(ReadOnlySpan<char>, char): one just foreach'ing over each character, one doing the common IndexOf/slicing loop, and then one using Vector128/256 (minimally tested, so it's possible there are some bugs lurking here and/or it's not the most optimal vectorization).

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;

[MemoryDiagnoser]
public unsafe partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    [Params(8, 1024)]
    public int Length { get; set; }

    [Params(false, true)]
    public bool Fill { get; set; }
    private char[] _data;
    private const char Target = 'c';

    [GlobalSetup]
    public void Setup() => _data = Enumerable.Repeat(Fill ? Target : '\0', Length).ToArray();

    [Benchmark(Baseline = true)]
    public int OneAtATimeLoop()
    {
        int count = 0;
        foreach (char c in _data)
        {
            if (c == Target)
            {
                count++;
            }
        }
        return count;
    }

    [Benchmark]
    public int IndexOfLoop()
    {
        int count = 0;
        ReadOnlySpan<char> span = _data;
        int pos;
        while ((pos = span.IndexOf(Target)) >= 0)
        {
            span = span.Slice(pos + 1);
            count++;
        }
        return count;
    }

    [Benchmark]
    public int Vectorized() => CountVectorized(_data, Target);

    private static int CountVectorized(ReadOnlySpan<char> span, char value)
    {
        ref ushort current = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
        ref ushort end = ref Unsafe.Add(ref current, span.Length);

        int count = 0;

        if (Vector128.IsHardwareAccelerated && span.Length >= Vector128<ushort>.Count)
        {
            if (Vector256.IsHardwareAccelerated && span.Length >= Vector256<ushort>.Count)
            {
                Vector256<ushort> targetVector = Vector256.Create((ushort)value);
                ref ushort oneVectorAwayFromEndMinus1 = ref Unsafe.Subtract(ref end, Vector256<ushort>.Count - 1);
                do
                {
                    count += BitOperations.PopCount(Vector256.Equals(Vector256.LoadUnsafe(ref current), targetVector).ExtractMostSignificantBits());
                    current = ref Unsafe.Add(ref current, Vector256<ushort>.Count);
                }
                while (Unsafe.IsAddressLessThan(ref current, ref oneVectorAwayFromEndMinus1));

                if (Unsafe.IsAddressLessThan(ref current, ref Unsafe.Subtract(ref end, Vector128<ushort>.Count - 1)))
                {
                    count += BitOperations.PopCount(Vector128.Equals(Vector128.LoadUnsafe(ref current), Vector128.Create((ushort)value)).ExtractMostSignificantBits());
                    current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
                }
            }
            else
            {
                Vector128<ushort> targetVector = Vector128.Create((ushort)value);
                ref ushort oneVectorAwayFromEndMinus1 = ref Unsafe.Subtract(ref end, Vector128<ushort>.Count - 1);
                do
                {
                    count += BitOperations.PopCount(Vector128.Equals(Vector128.LoadUnsafe(ref current), targetVector).ExtractMostSignificantBits());
                    current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
                }
                while (Unsafe.IsAddressLessThan(ref current, ref oneVectorAwayFromEndMinus1));
            }
        }

        while (Unsafe.IsAddressLessThan(ref current, ref end))
        {
            if (current == value)
            {
                count++;
            }

            current = ref Unsafe.Add(ref current, 1);
        }

        return count;
    }
}

It's then running those three approaches over two kinds of input: one without any of the target character, and one entirely filled with the target character.

Method Length Fill Mean Error StdDev Ratio
OneAtATimeLoop 8 False 6.514 ns 0.0451 ns 0.0421 ns 1.00
IndexOfLoop 8 False 2.641 ns 0.0679 ns 0.0808 ns 0.41
Vectorized 8 False 2.128 ns 0.0171 ns 0.0152 ns 0.33
OneAtATimeLoop 8 True 3.286 ns 0.0739 ns 0.0725 ns 1.00
IndexOfLoop 8 True 22.549 ns 0.2771 ns 0.2592 ns 6.86
Vectorized 8 True 2.148 ns 0.0161 ns 0.0151 ns 0.65
OneAtATimeLoop 1024 False 887.487 ns 3.4035 ns 2.8421 ns 1.00
IndexOfLoop 1024 False 31.674 ns 0.0780 ns 0.0730 ns 0.04
Vectorized 1024 False 49.979 ns 0.1364 ns 0.1209 ns 0.06
OneAtATimeLoop 1024 True 488.118 ns 1.0596 ns 0.8848 ns 1.00
IndexOfLoop 1024 True 5,794.672 ns 7.9794 ns 7.0735 ns 11.87
Vectorized 1024 True 50.504 ns 0.2571 ns 0.2279 ns 0.10

The IndexOfLoop approach that we normally push folks to does a great job when there aren't many of the target. But as the density increases, it gets worse and worse, to the point where if the haystack is entirely full of the target, it's 12x worse than the simple foreach. But the Vector128/256 approach does well in all the cases.

@stephentoub stephentoub modified the milestones: Future, 8.0.0 Sep 9, 2022
@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Sep 9, 2022
@stephentoub stephentoub self-assigned this Sep 13, 2022
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Sep 13, 2022

I would recommend against having StringComparison-based overloads for now. It's difficult for most callers to reason about the algorithmic complexity of linguistic string operations, and we don't want to end up in a world where callers inadvertently end up triggering an O(n^2) operation. We can add them in the future if there's significant demand for them and we're able to make reasonable guarantees about complexity.

Other overloads look 👍 though!

Edit: A Twitter micro-thread, for those curious on the vagaries of string algorithmic complexity.

@bollhals
Copy link
Contributor Author

That's fine for me, I never needed them anyway.

One question regarding overloads, do you think it might be worth to special case the count with 2 values? (Either with one return count each or combined) Or is this already too "edge casey" to start with?

@bollhals
Copy link
Contributor Author

Hmm now that I think of it, that would be rather a CountAny() than Count(). Should probably be its own api review if needed.

@stephentoub
Copy link
Member

stephentoub commented Sep 13, 2022

One question regarding overloads, do you think it might be worth to special case the count with 2 values? (Either with one return count each or combined) Or is this already too "edge casey" to start with?

I don't understand the question; can you elaborate with an example? Do you mean Count(span, value0, value1) where it's counting the number of times either value0 or value1 occurs? I think in that case we're starting to get into the weeds and would want to see compelling examples of where such an overload would yield meaningful improvements over Count(span, value0) + Count(span, value1) (I'm sure it can be done faster when combined, I just mean whether it's so much faster that it moves the needle considerably on the all-up scenario). It also has questionable semantics if the values here are multi-element and could overlap, in which case you then need to figure out precedence, and IMO start getting more into regex territory.

@bollhals
Copy link
Contributor Author

I went ahead and updated my initial post and removed the StringComparison overloads.

@stephentoub Fully agree with your statement. Let's not make this more complex than it needs to be for now.

If this API gets approved, and it is not urgently needed by anyone, I wouldn't mind taking this one. (It will take me some time to get startet, as it would be my first issue here)

@stephentoub stephentoub removed their assignment Sep 13, 2022
@stephentoub
Copy link
Member

If this API gets approved, and it is not urgently needed by anyone, I wouldn't mind taking this one. (It will take me some time to get startet, as it would be my first issue here)

Thanks. You're welcome to. I'd already sketched it out:
main...stephentoub:runtime:memoryextensionscount
but you can either start from scratch or from where I left off (in which case the main things remaining would be adding tests, fixing any bugs, validating perf, and looking for opportunities to use the new methods elsewhere in dotnet/runtime).

@bartonjs
Copy link
Member

bartonjs commented Sep 20, 2022

Video

  • We asked if we needed to specialize the types in overloading, the answer was no.
  • We asked what the answer was for ((ReadOnlySpan<char>)"AAAA").Count("AA"), and the answer was 2 (non-overlapping count).
    • Please ensure that gets documented and tested.
  • There was a hint of a suggestion that we might not need to overload across ROSpan and Span, but we're leaving it as-proposed.
namespace System
{
    public sealed partial class MemoryExtensions
    {
        public static int Count<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>?;

        public static int Count<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;

        public static int Count<T>(this Span<T> span, T value) where T : IEquatable<T>?;

        public static int Count<T>(this Span<T> span, ReadOnlySpan<T> value) where T : IEquatable<T>?;
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 20, 2022
@stephentoub
Copy link
Member

@bollhals, this has been approved now. Would you like to pick it up?

@bollhals
Copy link
Contributor Author

@bollhals, this has been approved now. Would you like to pick it up?

Happy to. It might take me a few weeks to get it done (vacation, get the env running, getting some more unstanding of the vector implementation,...) I assume this is not time critical.

One more question to the api, I saw your comment in some other issue about where T : IEquatable<T>? for these types of extension methods, whether this should be removed or not. Shall I start with it as approved?

@stephentoub
Copy link
Member

Happy to.

Great. Thanks.

I assume this is not time critical.

Correct.

Shall I start with it as approved?

Yup. Everything relevant on MemoryExtensions has been updated with where T : IEquatable<T>?, and the approved methods here follow suit.

@stephentoub
Copy link
Member

@bollhals, just wanted to check in. Still planning to work on this?

@bollhals
Copy link
Contributor Author

Thanks for checking in, I still plan to, but I got surprised by a private relocation, so I wasn't able to work a lot on it.
FYI my current state: I was able to use the repo and build everything needed and started writing test cases for your initial implementation.

If it is a priority, then please feel free to take over, otherwise would plan to continue on it in the next weeks once I'm done relocating.

@stephentoub
Copy link
Member

Not a problem or rush; just wanted to check in.

@MihaZupan
Copy link
Member

Do you mean Count(span, value0, value1) where it's counting the number of times either value0 or value1 occurs? I think in that case we're starting to get into the weeds and would want to see compelling examples of where such an overload would yield meaningful improvements over Count(span, value0) + Count(span, value1) (I'm sure it can be done faster when combined, I just mean whether it's so much faster that it moves the needle considerably on the all-up scenario).

@stephentoub Would that be a potential place for (not saying we actually need it)

int CountOfAny{Except}(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);

?
I'm assuming at least #76020 could use something like that to exclude the number of whitespace characters for the length calculation (if we cared about perf to that degree when whitespace is present).

@stephentoub
Copy link
Member

@bollhals, happy new year, just checking in... is this still something you're planning to work on soon?

Would that be a potential place for (not saying we actually need it)

Yes, any of our searching APIs (IndexOf, Contains, Replace, Count, etc.) could possibly have IndexOfAnyValues-based overloads in the future. It does beg the question of whether we might want to rename the type to SearchValues or something a bit less API specific (though we talked about that in the API review and left it as it was for now).

@bollhals
Copy link
Contributor Author

@bollhals, happy new year, just checking in... is this still something you're planning to work on soon?

Happy new year to you too!
Thanks for checking in. I've actually created all test cases in the meanwhile but stopped when I hit a bug (probably in my test, though haven't figured it out yet).

One question I had regarding tests, what kind of test do you expect? So far I've mainly copied the ones for ReadOnlySpan.Contains method (so tests with byte, int, TInt (wrapped one) and string for both Count methods (single value and ROS))

@stephentoub
Copy link
Member

One question I had regarding tests, what kind of test do you expect?

The ideal would be 100% code coverage, exercising common cases and edge cases, and for at least two T types.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 15, 2023
@stephentoub stephentoub removed their assignment Jan 24, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 27, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants