Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The indexer of Span (the Item[int index] or this[int index] property) takes an int instead of a uint like the equivalent on an Array. #28070

Open
tarekgh opened this issue Dec 6, 2018 · 25 comments

Comments

@tarekgh
Copy link
Member

tarekgh commented Dec 6, 2018

This causes a lot of code changes when switching from new Foo[20U] to stackalloc Foo[20U]. Since you can't allocate a negative sized array, it seems that this should take a uint, and not an int.

Note

This issue is a copy of the reported issue on the developer community https://developercommunity.visualstudio.com/content/problem/376789/the-indexer-of-span-the-itemint-index-or-thisint-i.html

@tarekgh
Copy link
Member Author

tarekgh commented Dec 6, 2018

@stephentoub
Copy link
Member

takes an int instead of a uint like the equivalent on an Array

According to the C# spec, the index can be "an expression of type int, uint, long, ulong, or can be implicitly converted to one or more of these types", so it's not that span takes an int instead of a uint, it's that C# allows arrays to be indexed by things other than int as well, and we only have the one overload on span that's int-based.

@jkotas
Copy link
Member

jkotas commented Dec 6, 2018

, it's that C# allows arrays to be indexed by things other than int as well

Correct.

BTW: The runtime-provided indexer methods on arrays take int indices only as well.

foreach (var m in typeof(string[]).GetMethods(BindingFlags.DeclaredOnly | BindingFlags.Instance | BindingFlags.Public)) Console.WriteLine(m);

@mikedn
Copy link
Contributor

mikedn commented Dec 6, 2018

it's that C# allows arrays to be indexed by things other than int as well

And, for the sake of completeness, the implementation is slightly bugged at that: dotnet/roslyn#30648. Luckily it affects only MD arrays.

The runtime-provided indexer methods on arrays take int indices only as well.

That's perhaps not that relevant as IL instructions like ldelem accept both int32 and native int. But then there are no IL instructions to index spans, similar to how there are no IL instructions to index MD arrays.

Should this issue be moved to csharplang then?

@tannergooding
Copy link
Member

Should this issue be moved to csharplang then?

I don't believe so. The issue would likely be closed as non-actionable since this isn't something the language can handle. As you indicated, there is no IL instruction for indexing Span<T> so the compiler relies on the exposed API surface area for functionality like indexing. The only fix here is to expose another indexer that also takes uint or to tell users that they need to explicitly cast to int when indexing.

@mikedn
Copy link
Contributor

mikedn commented Dec 6, 2018

I don't believe so. The issue would likely be closed as non-actionable since this isn't something the language can handle.

I don't think you are in a position to decide what the language can handle or not handle. You didn't even bother to understand the situation and took my side node about IL not having an instruction as evidence that the language can't handle this. Despite obvious evidence that the language already does this for arrays.

@tannergooding
Copy link
Member

Despite obvious evidence that the language already does this for arrays.

This is because the language doesn't emit an API call in order to access arrays, they are a built-in/primitive type and are accessed directly via IL instructions.

Span<T> is not a built-in/primitive type and has no IL instructions that would allow for direct access. The compiler has to emit IL, and for something that doesn't have baked-in support, it has to emit a call to an existing API. It could theoretically add support by baking in knowledge of Span<T> and its internal layout, but there are other problems with that as well (such as ensuring the bounds checks still happen). As has been mentioned by LDM members in the past, all language proposals automatically start out at -100 points and you need 100 points to get them implemented. I don't believe there is any chance that this would happen given the little benefit and that the fix to CoreFX is trivial in comparison.

I don't think you are in a position to decide what the language can handle or not handle

I did not decide that this is something the language should not handle. I said that it would likely be closed as non-actionable.

@mikedn
Copy link
Contributor

mikedn commented Dec 6, 2018

This is because the language doesn't emit an API call in order to access arrays, they are a built-in/primitive type and are accessed directly via IL instructions.

I'm not sure what to say. If even after being told that you didn't bother to understand the situation before replying you reply while still not bothering then there is nothing to discuss. I've had quite enough of this already.

@tannergooding
Copy link
Member

If you are saying you were asking if dotnet/roslyn#30648 should be moved to dotnet/csharplang, that wasn't necessarily immediately clear and this could have been made infinitely simpler by simply stating something along the lines of:

I meant should dotnet/roslyn#30648 be moved to dotnet/csharplang

If you are saying it was something else, then I am completely missing it and this would be made simpler if you elaborated rather than leaving people to guess and grab at what you mean if it is obvious there was some form of miscommunication 😄

  • And yes, I could have also made this simpler by asking you to explain what I missed

@stephentoub
Copy link
Member

stephentoub commented Dec 6, 2018

I suggest we leave the issue here, since it's not necessarily any more or less language than it is library/runtime, but have a discussion with the C# language design team about it. cc: @MadsTorgersen

I see three main options:

  1. Do nothing. Developers cast (ideally checked for {u}long) if necessary as they do today.
  2. Add overloads to {ReadOnly}Span<T> for whatever subset of uint, long, and ulong we care to.
  3. Add language support, that would presumably do the equivalent of a cast (checked for {u}long) as it does today for array access.

My preference is (2), as it eases moving from arrays to spans and limits the knowledge the language needs about spans (it has some knowledge today, but mostly for the more general ref struct concept). We've also already been adding more overloads for the new Range/Index support.

@jkotas
Copy link
Member

jkotas commented Dec 6, 2018

Add overloads to {ReadOnly}Span<T> for whatever subset of uint, long, and ulong we care to.

Would it be meaningful improvement? int indices are very ingrained into the .NET surface. Length property and all other APIs take take or return indices would still return int. Unsigned integers are like unsafe code in .NET - you can use them, but it requires quite a bit of ceremony.

@stephentoub
Copy link
Member

Would it be meaningful improvement?

Not a huge one, but I've now heard several folks asking about this. @eerhardt hit up against this the other day with some ML stuff.

@mikedn
Copy link
Contributor

mikedn commented Dec 6, 2018

Not a huge one, but I've now heard several folks asking about this. @eerhardt hit up against this the other day with some ML stuff.

Would be good to have an example. One reason to use uint is due to some potential performance gains. Similar gains could usually be obtained by improvements in the JIT, if a value is used as span/array index the JIT should be able to tell that it is unsigned. As such, it would be a pity to pollute the API because the JIT is having difficulties in keeping up with the needs.

On the other hand, the fact that array indexing can be done using uint is solely the language's "fault" so it's IMO their problem. Though the prospect of having to make Span<T> an array type or introduce some kind of "array like type" notion into the language spec isn't exactly enticing.

@eerhardt
Copy link
Member

eerhardt commented Dec 6, 2018

In case anyone is interested, here is the ML scenario where I hit this problem:

https://github.com/dotnet/machinelearning/pull/1650/files#diff-2dc3cad0906ef83ce7b8297e32c3b26dR343

                     uint uintSrc = 0;
                     _convertToUInt(in src, ref uintSrc);
                     // Assign to NA if key value is not in valid range.
-                    if (0 < uintSrc && uintSrc <= _values.Length)
-                        dst = _values[uintSrc - 1];
+                    if (0 < uintSrc && uintSrc <= values.Length)
+                        dst = values[(int)(uintSrc - 1)];

Previously _values was an array. And uintSrc is a uint. ML.NET uses uint here to represent what they call "KeyTypes". More info on KeyTypes.

So previously, when using an array, I could index into the array using a uint. But now that I converted the code to use ReadOnlySpan, I could no longer index using uint, and was forced to cast.

@jkotas
Copy link
Member

jkotas commented Dec 6, 2018

uintSrc <= _values.Length

Note that this code is equivalent to (long)uintSrc <= (long)_values.Length because of how implicit conversions work in C#. The JIT may be able to optimize out these casts to long in some cases, but they are common performance trap when folks try to use uints to improve performance. You have to sprinkle it with uint casts to avoid this trap, like: if (0 < uintSrc && uintSrc <= (uint)_values.Length)

@jkotas
Copy link
Member

jkotas commented Dec 6, 2018

https://github.com/dotnet/machinelearning/pull/1650/files#diff-2dc3cad0906ef83ce7b8297e32c3b26dR343

Just looking at the code surrounding the delta, there is a lot of manual (int) and (uint) casts even before your Span change. Is there a good reason why ML.NET uses the mix of ints and uints? It does not look pretty.

@eerhardt
Copy link
Member

eerhardt commented Dec 6, 2018

Is there a good reason why ML.NET uses the mix of ints and uints?

I can only guess, since it precedes my time working on the project. But reading More info on KeyTypes, it states:

Key types are used for data that is represented numerically, but where the order and/or magnitude of the values is not semantically meaningful.

So I assume the reasoning was "we just need an integer. Since order and/or magnitude of the values is not semantically meaningful, neither is the sign of the integer. Therefore - let's not use signed integers."

Also - since these KeyType values are used to index into an array, it is explicitly NOT valid to use a negative integer as a value, because if you did have a negative value it would throw when you indexed into the array.

However, according to that doc key type specifies:

A count value, between 0 and int.MaxValue, inclusive
A "minimum" value, between 0 and ulong.MaxValue, inclusive

Which means that all logical indexes must be <= int.MaxValue (since Count is represented as an int). So I'm not 100% sure why it couldn't use an int as an underlying value.

There may be other considerations though - these are just the first ones that stuck out to me. @TomFinley @Zruty0 and @glebuk might have more insight into why uint was chosen as the underlying type of KeyTypes.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@GrabYourPitchforks
Copy link
Member

Are we realistically likely to make traction on this? I'd personally love to see a uint indexer here as it feels more natural for certain coding patterns.

@jkotas
Copy link
Member

jkotas commented Feb 13, 2020

Do you believe that one-off solution for just the indexer would be worth it and move the needle enough?

I believe that we would need to add uint overloads for number of other methods (including Length property) to make Span+uint combination first class. That would be a lot new surface. So I doubt we are likely to make a traction on this.

@tannergooding
Copy link
Member

I think it would help for interop code or other scenarios where you may already have uint.

It avoids a few places where you need to insert casts and while not perfect, it can be done incrementally.

@GrabYourPitchforks
Copy link
Member

So I doubt we are likely to make a traction on this.

Sure, that's fine. There's already a critical mass behind what we have shipped.

If we were to introduce some theoretical LongSpan<T> in the future, perhaps we could say that both its Length property and its indexer are typed as nuint, and then the discussion becomes moot? I imagine high-performance code would want to work with a naturally-sized span as often as possible.

@markusschaber
Copy link

markusschaber commented Feb 17, 2020

3\. Add language support, that would presumably do the equivalent of a cast (checked for {u}long) as it does today for array access.

If we want to avoid baking special kowledge about Span into the Compiler, this could be driven by an attribute which tells the compiler to add checked casts for higher range integer types to a given parameter.

Having extension method indexers would be another possibility...

@GrabYourPitchforks
Copy link
Member

Just for fun, I experimented with this in a local branch. The Utf8Parser code for parsing integers contains many blocks similar to below.

if ((uint)index >= (uint)source.Length)
goto Done;
num = source[index];
if (!ParserHelpers.IsDigit(num))
goto Done;
index++;
answer = 10 * answer + num - '0';

By rewriting this code using nuint (instead of int) as an indexer - and while keeping the code 100% typesafe - we're seeing some fairly good performance improvements. These perf improvements are largely from eliminating the unnecessary movsx instruction that the normal int-based indexer emits.

Method Toolchain Mean Error StdDev Ratio RatioSD
TryParseInt32 master 3.069 us 0.0601 us 0.1037 us 1.00 0.00
TryParseInt32 utf8parser 2.847 us 0.0440 us 0.0390 us 0.91 0.04

@apws
Copy link

apws commented Sep 12, 2022

I hit this int/uint array/span indexing issue too during work on processor emulator, but probably because I cant figure yet how to place some raw inline array into struct to be able to pin/fix it to obtain damn unsafe pointer for it, instead of safe ref usage and tackling with heap byte array... (want also to share as much code logic in methods/functions possible between C# and C implementation ... quite possible in fact, only some tiny ugly obstacles here)

@Genbox
Copy link

Genbox commented Oct 16, 2022

The lack of an unsigned indexer for Span makes it exceedingly hard to work with it in interop code. Most win32 APIs work with DWORD (32bit unsigned), so for correctness I usually make a DllImport that matches the signature exactly.

However, as soon as you need to work with buffers and buffer lengths, you are pretty much forced to use a signed integer for length, as you otherwise can't work with it in C#.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests