-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard-wired index operator for vectors implicitly converts integral types #10453
Comments
This would be really easy to fix, and I agree. However, it was intentionally written this way based on comments in the code and the fact that it took a few dozen lines of code to add support for it. |
Oh ok. I'm just thinking that if we get the index trait up to scratch, indexing vectors should be implemented with that rather than hard-coded in the compiler. If that is the case then all those implicit conversions will have to be re-written. |
Well, it would still need to be implemented in the compiler due to the lack of CTFE. I get what you're saying though :). |
(a big +1 for removing any magic from built-in types that we don't plan to expose to non-built-in types) |
Oh blast, forgot about that. I guess that shoots down my wish for the numeric operators and casts to be lib-ified too. :( |
It was intentional but I think we should get rid of it. It dates from earlier days before we had integer type inference. |
I agree it interacts poorly with overloaded index. |
Nominating. |
Assigning 1.0, P-backcompat-lang. |
(this topic was discussed over the work week, though we may not have realized that there was already this issue opened for it.) |
The details are outlined in the first commit. Closes #10453
So this means I won't be able to use my custom index/integral type that doesn't implicitly convert to unsigned to index std containers? This seems like a step back. An index is a concept/typeclass, forcing a particular index (like int) seems wrong. Unsigned is even worse. It might seem obvious that if sizes are always >= 0, one should use unsigned types for them. However, this is actually not the case. There is a Q&A in GoingNative 2013 where both Bjarne Stroustrup and Herb Sutter comment on this issue being a big mistake in the design of the C++ STL (the choice of unsigned size/index over signed ones, e.g. for std::size_t and .size() member functions). The problem is that integer arithmetic on unsigned is just "not natural" and thus more dangerous, and thus the recommended thing to do is use unsigned when you need bit fiddling, and signed integers otherwise. Imposing unsigned integer types in the std container interfaces forces your users to either:
I fail to see what the problem is with requiring indices to be an integral. Some times you want to prohibit implicit conversions between ints to strongly type an interface. Since both integers model integral you could (before) still use them naturally to interface with std containers. Now you can't, which gets in the way of being able to write your own Integral types that works with the language and library as if it were a native one. |
On Sun, Apr 06, 2014 at 09:56:46AM -0700, gnzlbg wrote:
No, this doesn't mean that at all. This bug concerned only what types |
But the same arguments I mention hold for built-in array types:
|
Indexing a vector is inherently linked to the address space of the machine (since the size of a vector is limited by available address space), which is exactly what User-defined indexable types (which don't necessarily have the same strong link to memory) will be able to overload some traits with whatever index types they like (including generic ones): #6515 |
Can I use an
|
If we used signed integers to index slices, the bounds check would be significantly more expensive... |
The only thing that is easier to check is that unsigned integers can't be smaller than zero. OTOH, checking that they do not overflow the array is hard, since they can be very large. If people use unsigned for positive things and do math with them (think subtractions) they will wrap around (this is a very common bug), and you will overflow. If people who know better use signed integers, they will need to cast them back to unsigned for accessing the array. Again, if the signed is negative, the cast will succeed, and accessing the array will overflow too. If you know both the index and the array bounds at compile-time, doing the check is equally expensive for signed and unsigned types. If you do not, you should only do the check at run-time in debug mode or something (since it will be very expensive), and then again, the difference between signed and unsigned in cost is very small. So the performance argument is weak. The strong argument is that unsigned integer arithmetic is dangerous, and you want people to prefer signed integers over unsigned ones. Making the interface of a fundamental datatype use unsigned will lead to either people preferring unsigned since they have to interface with your data type (the worst possible thing that could happen), or less worse but still bad: unsafe cast from signed to unsigned. |
Well that would be a major leap backwards. I don't get how you can say "unsigned ints are dangerous" and on the other hand propose that indexing is not validated at runtime ;), which is one of the major curses of previous native languages and have cost the industry literally trillions of dollars.
I don't really get what you mean with "unsafe uint"... How are uints unsafe? Where is the difference between uints overflowing and ints overflowing. In no case, without boundary checks, this will make things "safe" for any of both. Overflowing integers is the problem. They should raise a task fault, unless specified otherwise by the programmer. And indexing should check for boundaries, unless specified otherwise by programmers (perhaps both only in "unsafe" marked methods). For the latter I would go as far as saying "History has proven, programmers should not be able to disable checking boundaries" In the future, most of these checks might be eliminated by runtime instrumentation or static analysis. It's not a good idea to base a language made for the future (it will take years for rust to go mainstream and once it does it may live as long as C++) on compiler flaws we have to deal with today. Ratehr make it a bit slower and safe, and leave the optimization to the next generations of compilers & VMs. |
The difference is in the arithmetic. With signed ints subtraction does what most people expect. With unsigned ints this is not the case. This is further complicated if distances between values can be negative, and thus users mix signed ints for distances to offset unsigned indices. IMO just using signed ints for everything is better than mixing both. Anyhow what I am proposing is to leave the Index type as a concept (or typeclass), such that its type is deduced by the compiler. This will allow the user to chose wether to use ints, uints, or some special int with special rules (think about conversions, big ints, ...). |
You mean if I do this on unsigned: "abs(low - high)" then I have a problem ;). But generally signed ints are just unsigned int with value range shifted a bit. They both have the same problem of not being "natural" because of overflows. I don't think this discussion will get you far. What SHOULD be done instead is trap overflows by default, so that they can't yield valid executions (unless requested explicitly, like for hash algorithms). Even better would be to allow us to specify the machine datatype along with a number range, like "a: u8{1..10}, b: u8{1..30}"... Now if I do this "let c = abs(a - b)" the compiler would know for sure that this code is seriously broken ;). Would be much easier to actually warn for dangerous constructs that are likely to overflow under certain input conditions. While a task fault on overflow is a good last resort, it is still not really pleasing because this might lead to random failures in a production system that could have been caught with a richer type system at compile time. I agree with you on the deduction though. Quite a bit annoying in RUST atm, not only for this one, but also in general. There are a lot of unnecessary casts that can't possible go wrong but still trigger an error (u8 -> u64, &mut u64 -> u64, etc. for instance), well that's not deduction anymore but from a user perspective it is pretty close. |
No, I mean that if you work with unsigned ints you will:
Why put your users through this trouble? The only win I see is that for 64bit ints going from signed to unsigned allows you to have twice as large sizes (from 2^63 to 2^64). I don't think this is a big win at all. And it introduces a lot of trouble. People needing this extended range will probably be using Big Integers anyways and thus would also benefit from a parametric index type that allows Big Integers. Unsigned ints are useful for bit twiddling, but people doing bit twiddling know what they are doing. I really believe that unsigned integers are not useful for array length or indices since the trouble of having to deal with them and their arithmetic (offseting indices) is just not worth it for almost no win in size. In C++ using unsigned for array/vector/string/container lengths was a mistake that every C++ user has to deal with. A lot of committee members agree that the small win is not worth the burden on the users. |
Well I think we are talking about different things here ;). I totally agree with you that the trouble is not worth is. But what I am really suggesting is the trap on overflow (at least) and maybe even a type system that helps with 99% of the use cases, where I really don't need much range (I seldomly have encountered situations in which I really needed the full signed int range), and in such cases is able to detect at compile time if there can be overflows with the current range restrictions imposed on inputs. Anyway, I am just complaining a bit about the discussion signed/unsigned, while in my POV the real issue is that overflows are allowed at all. I generally think that code which can fail randomly at runtime smells xD. Using signed vs. unsigned just makes those time bombs less likely. |
I think that is a different discussion and a quality of implementation issue. In C++ overflow/underflow of signed integers is undefined behavior which means anything could happen. This allows the compiler implementors to optimize as if this couldn't happen in "release" mode, and e.g. terminate the program or throw an exception in debug mode if overflow happens. |
Add utility macros to help with writing tests. Adds two utility macros to help with testing: * `external` expands to it's argument tokens, but makes them appear to come from an external macro. Helps make tests for `in_external_macro` much more readable. * `inline_macros` is an attribute macro which allows the use of a pseudo `inline!` macro which expands to it's argument tokens, but makes them appear to be from a crate-local macro expansion. This removes the need to write `macro_rules` boilerplate when testing how lints interact with macros. --- `external`'s usage is simple. `external!(struct Foo { x: u32});` will make the struct appear as though it came from an external macro. Individual tokens can be escaped if needed. `external!($x + 0 / 10)` will make everything except `x` appear as though it came from an external macro. Can also use `$literal` and `$(tokens...)` as well. --- `inline_macros` is more complicated due to compiler constraints. Given: ```rust #[inline_macros] fn foo() { inline!(5 + 5 / 10); } ``` `inline!(5 + 5 / 10)` will be replace with a call to a generated macro which expands to the contained tokens. Tokens can be escaped by prefixing them with `$`: ```rust #[inline_macros] fn foo() { let x = 5; inline!($x + 5 / $10); } ``` This will pass `x` as an `ident` argument and `10` as a `literal` argument. Token sequences can also be passed with `$(...)`: ```rust #[inline_macros] fn foo() { let mut x = 5; inline!(if $(x >= 5) { $x = 5; }); } ``` This will pass `x >= 5` as `tt` arguments, and `x` as an `ident` argument. --- Not 100% sure `inline_macros` is actually worth having. It does make the tests a little easier to read once you're used to it and it becomes more useful once there are multiple macro tests. The verbosity of declaring single use macros starts to hurt at that point. changelog: None
This is legal at the moment:
Shouldn't we only allow
uint
s here?The text was updated successfully, but these errors were encountered: