From 700a362915961b11613d31fc04370fea5e08c8e8 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 24 Aug 2018 07:25:05 -0400 Subject: [PATCH 1/9] first draft --- active_discussion/representation.md | 145 +++++++++++++++++++++++++--- 1 file changed, 134 insertions(+), 11 deletions(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 4e223935..3abee644 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -1,14 +1,137 @@ -# Data structure representation +# Data structure representation and validity requirements -In general, Rust makes few guarantees about memory layout, unless you -define your structs as `#[repr(rust)]`. But there are some things that -we do guarantee. Let's write about them. +## Introduction -TODO: +This discussion is meant to focus on two things: -- Find and link to the various RFCs -- Enumerate things that we *might* in fact guarantee, even for non-C types: - - e.g., `&T` and `Option<&T>` are both pointer sized - - size of `extern fn` etc (at least on some platforms)? - - For which `T` is `None` represented as a "null pointer" etc? - - (Which "niche" optimizations can we rely on) +- What guarantees does Rust make regarding the layout of data structures? +- What invariants does the compiler require from the various Rust types? + - the "validity invariant", as defined in [Ralf's blog post][bp] +- What invariants can safe code expect to hold for the various Rust types? + - the "safety invariant", as defined in [Ralf's blog post][bp] + +[bp]: https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html + +### Layout of data structures + +In general, Rust makes few guarantees about the memory layout of your +structures. For example, by default, the compiler has the freedom to +rearrange the field order of your structures for more efficiency (as +of this writing, we try to minimize the overall size of your +structure, but this is the sort of detail that can easily change). For +safe code, of course, any rearrangements "just work" transparently. + +If, however, you need to write unsafe code, you may wish to have a +fixed data structure layout. In that case, there are ways to specify +and control how an individual struct will be laid out -- notably with +`#[repr]` annotations. One purpose of this section, then, is to layout +what sorts of guarantees we offer when it comes to layout, and also +what effect the various `#[repr]` annotations have. + +### Validity invariant + +The "validity invariant" for each type defines what must hold whenever +a value of this type is considered to be initialized. The compiler expects +the validity invariant to hold **at all times** and is thus allowed to use +these invariants to (e.g.) affect the layout of data structures or do other +optimizations. + +Therefore, the validity invariant must **at minimum** justify all the +layout optimizations that the compiler does. We may want a stronger +invariant, however, so as to leave room for future optimization. + +As an example, a value of `&T` type can never be null -- therefore, +`Option<&T>` can use null to represent `None`. + +### Safety invariant + +The "safety invariant" for each type defines what must hold whenever +safe code has access to a type. + +This invariant must **at minimum** justify all the things that our +type system allows without an `unsafe` keyword being required. + +For example, a value of `&T` must be dereferencable, since safe code +could always choose to dereference it. + +## Goals + +- Define what we guarantee about the layout of various types + and the effect of `#[repr]` annotations. +- Define the **safety requirements** of various types that safe + code requires (and which unsafe code must uphold at the safe/unsafe boundary). +- Define the **validity requirements** of various types that unsafe + programmers must uphold at all times. + - Also examine when/how we could dynamically check these requirements. +- Uncover the sorts of constraints that we may wish to satisfy in the + future. + +## Some interesting examples and questions + +- `&T` where `T: Sized` + - This is **guaranteed** to be a non-null pointer +- `Option<&T>` where `T: Sized` + - This is **guaranteed** to be a nullable pointer +- `Option` +- `usize` + - Platform dependent size, but guaranteed to be able to store a pointer? + - Also an array length? +- Uninitialized bits -- for which types are uninitialized bits valid? +- If you have `struct A { .. }` and `struct B { .. }` with no + `#[repr]` annotations, and they have the same field types, can we + say that they will have the same layout? + - or do we have the freedom to rearrange the types of `A` but not + `B`, e.g. based on PGO results + +## Active threads + +To start, we will create threads for each major categories of types +(with a few suggested focus points): + +- Integers and floating points + - What about uninitialized values? +- Booleans + - Prior discussions ([#46156][], [#46176][]) documented bool as a single + byte that is either 0 or 1. +- Enums + - See dedicated thread about "niches" and `Option`-style layout optimization + below. + - Define: C-like enum + - Can a C-like enum ever have an invalid discriminant? (Presumably not) + - Empty enums and the `!` type + - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads. + - [RFC 2363][] offers a proposal to permit specifying discriminations. +- Structs + - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out? + - e.g., what about different structs with same definition + - across executions of the same program? +- Tuples + - Are these effectively anonymous structs? +- Unions + - Can we ever say anything about the initialized contents of a union? + - Is `#[repr(C)]` meaningful on a union? +- Fn pointers (`fn()`, `extern "C" fn()`) +- References `&T` and `&mut T` + - Out of scope: aliasing rules + - We currently tell LLVM they are aligned and dereferenceable, have to justify that + - Safe code may use them also +- Raw pointers + - Effectively same as integers? +- Representation knobs: + - Custom alignment ([RFC 1358]) + - Packed ([RFC 1240] talks about some safety issues) +- ... what else? + +We will also create categories for the following specific areas: + +- Niches: Optimizing `Option`-like enums +- Uninitialized memory: when/where are uninitializes values permitted, if ever? +- ... what else? + + +[#46156]: https://github.com/rust-lang/rust/pull/46156 +[#46176]: https://github.com/rust-lang/rust/pull/46176 +[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363 +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html +[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html +[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html From 0210bd80c75510bc8da3e59b17194eac4bb99332 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 24 Aug 2018 11:29:52 -0400 Subject: [PATCH 2/9] remove talk of safety invariant --- active_discussion/representation.md | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 3abee644..6ee440ae 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -7,8 +7,9 @@ This discussion is meant to focus on two things: - What guarantees does Rust make regarding the layout of data structures? - What invariants does the compiler require from the various Rust types? - the "validity invariant", as defined in [Ralf's blog post][bp] -- What invariants can safe code expect to hold for the various Rust types? - - the "safety invariant", as defined in [Ralf's blog post][bp] + +NB. The discussion is **not** meant to discuss the "safety invariant" +from [Ralf's blog post][bp], as that can be handled later. [bp]: https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html @@ -43,23 +44,10 @@ invariant, however, so as to leave room for future optimization. As an example, a value of `&T` type can never be null -- therefore, `Option<&T>` can use null to represent `None`. -### Safety invariant - -The "safety invariant" for each type defines what must hold whenever -safe code has access to a type. - -This invariant must **at minimum** justify all the things that our -type system allows without an `unsafe` keyword being required. - -For example, a value of `&T` must be dereferencable, since safe code -could always choose to dereference it. - ## Goals - Define what we guarantee about the layout of various types and the effect of `#[repr]` annotations. -- Define the **safety requirements** of various types that safe - code requires (and which unsafe code must uphold at the safe/unsafe boundary). - Define the **validity requirements** of various types that unsafe programmers must uphold at all times. - Also examine when/how we could dynamically check these requirements. From 3f1c5b66c72bcb418d1e7d233ce1952b95b51805 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 24 Aug 2018 15:23:51 -0400 Subject: [PATCH 3/9] tweak wording around when validity constraints must hold --- active_discussion/representation.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 6ee440ae..2d6b62ea 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -48,8 +48,9 @@ As an example, a value of `&T` type can never be null -- therefore, - Define what we guarantee about the layout of various types and the effect of `#[repr]` annotations. -- Define the **validity requirements** of various types that unsafe - programmers must uphold at all times. +- Define the **validity requirements** of various types. These are the + requirements that must hold at all times when the compiler considers + a value to be initialized. - Also examine when/how we could dynamically check these requirements. - Uncover the sorts of constraints that we may wish to satisfy in the future. From c63964a3e43e13420ebf78a100785386ded8ca94 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 24 Aug 2018 15:24:36 -0400 Subject: [PATCH 4/9] discuss `isize` a bit --- active_discussion/representation.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 2d6b62ea..83911691 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -71,6 +71,8 @@ As an example, a value of `&T` type can never be null -- therefore, say that they will have the same layout? - or do we have the freedom to rearrange the types of `A` but not `B`, e.g. based on PGO results +- Rust currently says that no single value may be larger than `isize` bytes + - is this good? can it be changed? does it matter *here* anyway? ## Active threads From 2ffdace51a1c2674f7d2cb909322977fa885671f Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 24 Aug 2018 16:01:11 -0400 Subject: [PATCH 5/9] add notes about ABI --- active_discussion/representation.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 83911691..e130ccfa 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -5,6 +5,7 @@ This discussion is meant to focus on two things: - What guarantees does Rust make regarding the layout of data structures? +- What guarantees does Rust make regarding ABI compatibility? - What invariants does the compiler require from the various Rust types? - the "validity invariant", as defined in [Ralf's blog post][bp] @@ -29,6 +30,26 @@ and control how an individual struct will be laid out -- notably with what sorts of guarantees we offer when it comes to layout, and also what effect the various `#[repr]` annotations have. +### ABI compatibilty + +When one either calls a foreign function or is called by one, extra +care is needed to ensure that all the ABI details line up. ABI compatibility +is related to data structure layout but -- in some cases -- can add another +layer of complexity. For example, consider a struct with one field, like this one: + +```rust +#[repr(C)] +struct Foo { field: u32 } +``` + +The memory layout of `Foo` is identical to a `u32`. But in many ABIs, +the struct type `Foo` is treated differently at the point of a +function call than a `u32` would be. Eliminating these gaps is the +goal of the `#[repr(transparent)]` annotation introduced in [RFC +1758]. For built-in types, such as `&T` and so forth, it is important +for us to specify how they are treated at the point of a function +call. + ### Validity invariant The "validity invariant" for each type defines what must hold whenever @@ -93,7 +114,8 @@ To start, we will create threads for each major categories of types - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads. - [RFC 2363][] offers a proposal to permit specifying discriminations. - Structs - - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out? + - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out + (and/or treated by the ABI)? - e.g., what about different structs with same definition - across executions of the same program? - Tuples @@ -106,6 +128,7 @@ To start, we will create threads for each major categories of types - Out of scope: aliasing rules - We currently tell LLVM they are aligned and dereferenceable, have to justify that - Safe code may use them also + - When using the C ABI, these map to the C pointer types, presumably - Raw pointers - Effectively same as integers? - Representation knobs: @@ -126,3 +149,4 @@ We will also create categories for the following specific areas: [RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html [RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html [RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html +[RFC 1758]: https://rust-lang.github.io/rfcs/1758-repr-transparent.html From 8bff0593e09672fcd6ab833d3aa96910849f81fe Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 28 Aug 2018 12:01:35 -0400 Subject: [PATCH 6/9] added various points raised by people on the PR --- active_discussion/representation.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index e130ccfa..a83d6db4 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -2,7 +2,7 @@ ## Introduction -This discussion is meant to focus on two things: +This discussion is meant to focus on the following things: - What guarantees does Rust make regarding the layout of data structures? - What guarantees does Rust make regarding ABI compatibility? @@ -92,6 +92,8 @@ As an example, a value of `&T` type can never be null -- therefore, say that they will have the same layout? - or do we have the freedom to rearrange the types of `A` but not `B`, e.g. based on PGO results + - What about different instantiations of the same struct? (`Vec` + vs `Vec`) - Rust currently says that no single value may be larger than `isize` bytes - is this good? can it be changed? does it matter *here* anyway? @@ -102,6 +104,9 @@ To start, we will create threads for each major categories of types - Integers and floating points - What about uninitialized values? + - What about signaling NaN etc? ([Seems like a + non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381), + but it'd be good to resummarize the details). - Booleans - Prior discussions ([#46156][], [#46176][]) documented bool as a single byte that is either 0 or 1. @@ -118,12 +123,25 @@ To start, we will create threads for each major categories of types (and/or treated by the ABI)? - e.g., what about different structs with same definition - across executions of the same program? + - For example, [rkruppe + writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247) + that we might "want to guarantee (some subset of) newtype + unpacking and relegate repr(transparent) to being the way to + guarantee to other crates that a type with private fields is and + will remain a newtype?" - Tuples - Are these effectively anonymous structs? - Unions - Can we ever say anything about the initialized contents of a union? - Is `#[repr(C)]` meaningful on a union? + - When (if ever) do we guarantee that all fields have the same address? - Fn pointers (`fn()`, `extern "C" fn()`) + - When is transmuting from one `fn` type to another allowed? + - Can you transmute from a `fn` to `usize` or raw pointer? + - In theory this is platform dependent, and C certainly draws a + distinction between `void*` and a function pointer, but are + there any modern and/or realisic platforms where it is an + issue? - References `&T` and `&mut T` - Out of scope: aliasing rules - We currently tell LLVM they are aligned and dereferenceable, have to justify that From ea953e99d3184e8b8692cf923acd903ea2a7ec2f Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 28 Aug 2018 12:10:51 -0400 Subject: [PATCH 7/9] add a note about `0_usize` and null --- active_discussion/representation.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index a83d6db4..5fd64d0a 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -149,6 +149,8 @@ To start, we will create threads for each major categories of types - When using the C ABI, these map to the C pointer types, presumably - Raw pointers - Effectively same as integers? + - Is `ptr::null` etc guaranteed to be equal in representation to `0_usize`? + - C does guarantee that `0` when cast to a pointer is NULL - Representation knobs: - Custom alignment ([RFC 1358]) - Packed ([RFC 1240] talks about some safety issues) From 26bd2bb4452fa944507c1143b36cf7a688562ff6 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 28 Aug 2018 12:14:33 -0400 Subject: [PATCH 8/9] add some notes about usize/isize --- active_discussion/representation.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index 5fd64d0a..dde23907 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -107,6 +107,11 @@ To start, we will create threads for each major categories of types - What about signaling NaN etc? ([Seems like a non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381), but it'd be good to resummarize the details). +- usize/isize + - is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)? + what are edge cases here? + - Rust currently states that the maximum size of any single value must fit in with `isize` + - Can we say a bit more about why? (e.g., [ensuring that "pointer diff" is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192) - Booleans - Prior discussions ([#46156][], [#46176][]) documented bool as a single byte that is either 0 or 1. @@ -154,7 +159,6 @@ To start, we will create threads for each major categories of types - Representation knobs: - Custom alignment ([RFC 1358]) - Packed ([RFC 1240] talks about some safety issues) -- ... what else? We will also create categories for the following specific areas: From 358feac4274cdcae3b2887c5ce45c967694bf07c Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 30 Aug 2018 12:53:53 -0400 Subject: [PATCH 9/9] remove content that seems specific to the validity invariant --- active_discussion/representation.md | 58 ++++++++--------------------- 1 file changed, 16 insertions(+), 42 deletions(-) diff --git a/active_discussion/representation.md b/active_discussion/representation.md index dde23907..9464d12e 100644 --- a/active_discussion/representation.md +++ b/active_discussion/representation.md @@ -6,13 +6,12 @@ This discussion is meant to focus on the following things: - What guarantees does Rust make regarding the layout of data structures? - What guarantees does Rust make regarding ABI compatibility? -- What invariants does the compiler require from the various Rust types? - - the "validity invariant", as defined in [Ralf's blog post][bp] -NB. The discussion is **not** meant to discuss the "safety invariant" -from [Ralf's blog post][bp], as that can be handled later. - -[bp]: https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html +NB. Oftentimes, choices of layout will only be possible if we can +guarantee various invariants -- this is particularly true when +optimizing the layout of `Option` or other enums. However, designing +those invariants is left for a future discussion -- here, we should +document/describe what we currently do and/or aim to support. ### Layout of data structures @@ -50,30 +49,13 @@ goal of the `#[repr(transparent)]` annotation introduced in [RFC for us to specify how they are treated at the point of a function call. -### Validity invariant - -The "validity invariant" for each type defines what must hold whenever -a value of this type is considered to be initialized. The compiler expects -the validity invariant to hold **at all times** and is thus allowed to use -these invariants to (e.g.) affect the layout of data structures or do other -optimizations. - -Therefore, the validity invariant must **at minimum** justify all the -layout optimizations that the compiler does. We may want a stronger -invariant, however, so as to leave room for future optimization. - -As an example, a value of `&T` type can never be null -- therefore, -`Option<&T>` can use null to represent `None`. - ## Goals -- Define what we guarantee about the layout of various types - and the effect of `#[repr]` annotations. -- Define the **validity requirements** of various types. These are the - requirements that must hold at all times when the compiler considers - a value to be initialized. - - Also examine when/how we could dynamically check these requirements. -- Uncover the sorts of constraints that we may wish to satisfy in the +- Document current behavior of compiler. + - Indicate which behavior is "permitted" for compiler and which + aspects are things that unsafe code can rely upon. + - Include the effect of `#[repr]` annotations. +- Uncover the sorts of layout optimizations we may wish to do in the future. ## Some interesting examples and questions @@ -83,6 +65,7 @@ As an example, a value of `&T` type can never be null -- therefore, - `Option<&T>` where `T: Sized` - This is **guaranteed** to be a nullable pointer - `Option` + - Can this be assumed to be a non-null pointer? - `usize` - Platform dependent size, but guaranteed to be able to store a pointer? - Also an array length? @@ -103,11 +86,9 @@ To start, we will create threads for each major categories of types (with a few suggested focus points): - Integers and floating points - - What about uninitialized values? - What about signaling NaN etc? ([Seems like a non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381), but it'd be good to resummarize the details). -- usize/isize - is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)? what are edge cases here? - Rust currently states that the maximum size of any single value must fit in with `isize` @@ -131,9 +112,9 @@ To start, we will create threads for each major categories of types - For example, [rkruppe writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247) that we might "want to guarantee (some subset of) newtype - unpacking and relegate repr(transparent) to being the way to - guarantee to other crates that a type with private fields is and - will remain a newtype?" + unpacking and relegate `#[repr(transparent)]` to being the way + to guarantee to other crates that a type with private fields is + and will remain a newtype?" - Tuples - Are these effectively anonymous structs? - Unions @@ -147,10 +128,10 @@ To start, we will create threads for each major categories of types distinction between `void*` and a function pointer, but are there any modern and/or realisic platforms where it is an issue? + - Is `Option` guaranteed to be a pointer (possibly null)? - References `&T` and `&mut T` - Out of scope: aliasing rules - - We currently tell LLVM they are aligned and dereferenceable, have to justify that - - Safe code may use them also + - Always aligned, non-null - When using the C ABI, these map to the C pointer types, presumably - Raw pointers - Effectively same as integers? @@ -160,13 +141,6 @@ To start, we will create threads for each major categories of types - Custom alignment ([RFC 1358]) - Packed ([RFC 1240] talks about some safety issues) -We will also create categories for the following specific areas: - -- Niches: Optimizing `Option`-like enums -- Uninitialized memory: when/where are uninitializes values permitted, if ever? -- ... what else? - - [#46156]: https://github.com/rust-lang/rust/pull/46156 [#46176]: https://github.com/rust-lang/rust/pull/46176 [RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363