-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: int/uint portability to 16-bit CPUs #161
Closed
Closed
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
- Start Date: 2014-07-11 | ||
- RFC PR #: (leave this empty) | ||
- Rust Issue #: (leave this empty) | ||
|
||
# Summary | ||
|
||
Either rename the types `int` and `uint` to `index` and `uindex` to avoid | ||
misconceptions and misuses, or specify that they're always at least 32-bits wide | ||
to avoid the worst portability problems. Also document when to use and not use | ||
these types and which integer type to pick "by default." (See below for the | ||
meaning of "by default.") | ||
|
||
|
||
# Motivation | ||
|
||
- To avoid new integer overflow bugs when moving Rust code to smaller address spaces. | ||
- So programmers will know when to use which integer types. | ||
|
||
|
||
# Background | ||
|
||
The Rust language does array indexing using the smallest integer types that span | ||
the address space. For this, Rust defines integer types `int` and `uint` to be | ||
large enough to hold a pointer in the target environment. Beware that it does | ||
not promise that these are the fastest, "native," register, or C sized integers | ||
despite the names. | ||
|
||
(A rationale given for using unsigned array indexes is to allow array bounds | ||
checking in one comparison rather than two. However, a compiler can generate one | ||
unsigned comparison to bounds-check a signed integer index as long as the lower | ||
bound is 0.) | ||
|
||
`int`/`uint` may also be good choices for indexing and sizing in-memory | ||
containers. But using these types for computations that are not limited by | ||
memory leads to code that's not portable to smaller-address targets than it was | ||
developed and tested on. | ||
|
||
From C/C++/Java experience, programmers have learned to pick `int`/`uint` | ||
as the "default" integer types where not particularly constrained by | ||
requirements, e.g.: | ||
|
||
* where any modest-size integer will do (e.g. a loop index) | ||
* to avoid cluttering APIs with ungermane integer sizes | ||
* to hold tags and other values supplied by callers | ||
|
||
Programmers should figure out a value's needed range then maybe widen to | ||
a "default" type for easy interconnections. For a value in the range 1 .. 100, | ||
you can pick from 10 types. Choosing an 8-bit or 16-bit integer is an | ||
optimization. Premature? Which integer type should you pick when you're writing | ||
exploratory code and haven't yet done the range analysis? What if you're passing | ||
values through many layers of code and computations? | ||
|
||
A default is handy but a target-dependent size does not make a good default. And | ||
yet `int` and `uint` _look_ like default integer types. | ||
|
||
To clear up some misconceptions from C/C++: | ||
|
||
* _They're not the fastest integers._ Example: x86_64 and ARM64 have 64-bit address spaces and 64-bit integer registers, but 32-bit integers are faster since those arithmetic instructions are faster, more data fits in cache, and the vector instructions can operate on twice as many values at a time. | ||
* _They're not "native" size or register size._ Example: The [x32 ABI](https://en.wikipedia.org/wiki/X32_ABI) has 64-bit general purpose registers but 32-bit pointers so its `int`/`uint` are 32-bit. | ||
* _They're not necessarily the same size as C `int`._ C doesn't define `int` the same way. | ||
* _They're not wide enough to casually pick,_ given 16-bit address spaces. | ||
* _They're not "portable."_ They overflow differently and take different numbers of binary I/O bytes on different platforms. | ||
|
||
These misconceptions lead to misuses and thus to code with overflow bugs | ||
(checked or unchecked) when running in a smaller address space than originally | ||
considered and tested. | ||
|
||
The worst failure mode is in libraries written with desktop CPUs in mind and | ||
then used in small embedded devices (such as Atmel AVR, PIC controllers, and TI | ||
MSP430). Rust is a very compelling replacement for C/C++ in embedded devices | ||
and safety-critical robotics actuators. | ||
|
||
|
||
# Detailed design | ||
|
||
Change these two type names so they're self-documenting and less misused. The | ||
names `index` and `uindex` are meant to convey their use in array indexing. Use | ||
them more narrowly. | ||
|
||
Alternate name choices: | ||
|
||
- `isize` and `usize`, [borrowing from C's](http://en.cppreference.com/w/cpp/types/integer) `ssize_t` and `size_t` but adopting Rust's integer prefixes. | ||
- `intptr` and `uintptr`, [borrowing from C's](http://en.cppreference.com/w/cpp/types/integer) `intptr_t` and `uintptr_t`. These names are awkward by design. | ||
- `PointerSizedInt` and `PointerSizedUInt`. | ||
- `intps` and `uintps`. | ||
|
||
To ease the transition, first deprecate the old types. | ||
|
||
**Alternative:** Specify that these two integer types are _at least 32-bits | ||
wide_ on every target architecture. That avoids the worst failure mode although | ||
it doesn't help when code tested in a 64-bit address space later runs in a | ||
32-bit address space. | ||
|
||
**Either way:** The style guide should document when to use and not use these | ||
types and elect a particular integer type for programmers to pick "by default". | ||
This RFC recommends `i32`. | ||
|
||
The style guide should also recommend using signed integers except when unsigned values are required such as for modulo 2^N arithmetic. The | ||
[Google Style Guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types) explains: | ||
|
||
> In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this. ... | ||
> | ||
> Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce. | ||
|
||
Furthermore: | ||
|
||
> You should assume that an `int` is at least 32 bits, but don't assume that it has more than 32 bits. | ||
|
||
This assumption does not hold for PalmOS even on 32-bit ARM, where `int` is | ||
16-bits for backward compatibility with PalmOS running on Motorola 68000. | ||
|
||
|
||
# Drawbacks | ||
|
||
- Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And some people will just end-up redefining the |
||
- The new names are more verbose. | ||
|
||
|
||
# Alternatives | ||
|
||
1. Set a coding style guide and code review expectation to use `int`/`uint` only for array indexing and related operations despite C programmers' expectations. Elect an integer type such as `i32` to use "by default." Update the existing libraries. | ||
2. Fix the portability bugs later. | ||
|
||
|
||
# Notes | ||
|
||
See the discussions from many contributors to [Issue #14758](https://github.com/rust-lang/rust/issues/14758) and [Issue #9940](https://github.com/rust-lang/rust/issues/9940). | ||
|
||
[Daniel Micay notes](https://github.com/rust-lang/rust/issues/9940#issuecomment-32104831): | ||
|
||
> If you're using `int` as a "default", then you're not using it correctly. It | ||
> will be 16-bit on an architecture with a 16-bit address space, 32-bit or | ||
> 64-bit. If you're doing your testing on a 64-bit architecture, you're going to | ||
> miss plenty of bugs. | ||
|
||
[Carter Tazio notes](https://github.com/rust-lang/rust/issues/9940#issuecomment-32088729) | ||
that system-dependent integers in GHC Haskell cause recurring problems, and | ||
there's some buy-in for fixing it. | ||
|
||
[Niko Matsakis requested](https://github.com/rust-lang/rust/issues/9940#issuecomment-32119318) | ||
a survey of uses of `int` and `uint` showing how many of them are | ||
appropriate / inappropriate / borderline. | ||
|
||
More recently, [type inference was changed to not fall back to `int`/`uint`](https://github.com/rust-lang/rust/issues/6023) and there's an RFC for | ||
[Scoped attributes for checked arithmetic](https://github.com/rust-lang/rfcs/pull/146). | ||
|
||
[Issue #11831](https://github.com/rust-lang/rust/issues/11831) is also about | ||
`int` and `uint` pointer-sized integers. Was the conclusion the type inference | ||
change? A commitment to the names `int` and `uint`? In the latter case, this | ||
RFC amounts to making `int`/`uint` at least 32-bits wide and setting style | ||
guidelines for integer types. | ||
|
||
|
||
# Not in scope of this RFC | ||
|
||
* Changes in overflow handling. | ||
* Giving `Vec` a settable index size as `Vec<T, Index=uint>`. | ||
* Adding more target-dependent integer types such as C's `int_fast32_t` -- the fastest signed integer type at least 32 bits wide. (Do this in a library?) | ||
|
||
|
||
# Unresolved questions | ||
|
||
Who'll implement the changes? |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion makes a lot of sense in a context where overflow/underflow silently wraps around. However, if something like RFC PR #146 were to be implemented, then it would once again make sense to use types which more accurately express the range of legal values (i.e., which are self-documenting), because compiler-added checks can be enabled to catch errors where the value would go out of range. Accurate types with compiler-added assertions beats inaccurate types with programmer-added assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glaebhoerl So would you recommend we wait for PR #146 to be accepted or rejected before evaluating this RFC further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. This was just an ancillary remark on an ancillary part of the proposal. The main part of the proposal (which is about changes to the language to better accomodate [portability to] 16-bit architectures) is unaffected.
(And anyway, the suggestion makes sense in the context of the current language, and the style guide could just be updated again if the language changes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Nice insight, @glaebhoerl.
I'll make the style guide recommendation conditional on overflow-checking.
Q. Does/will overflow checking happen during conversion between integer types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A. It doesn't currently, but in the context of #146, if
#[overflow_checks(on)]
, I think it should.Rationale: As far as I can tell
as
is meant to preserve meaning rather than representation, e.g.5000i32 as f32
is equivalent to5000f32
and not totransmute::<i32, f32>(5000i32)
. Therefore if attempting to transport the meaning of the original value to the target type causes it to overflow, it should be caught.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Otherwise computing a value in one integer type then converting to another would accidentally bypass the overflow checks.