Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define padding with hypothetical Pad type #195

Merged
merged 5 commits into from
Aug 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion reference/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,24 @@ zero-sized type", to refer to zero-sized types with an alignment requirement of
For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment
requirement of 2.

#### Padding
[padding]: #padding

*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized.

Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties:
* `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit<u8>`.
* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized.

We can also define padding in terms of the [representation relation]:
A byte at index `i` is a padding byte for type `T` if,
for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`),
changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`).
In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any.

This definition works fine for product types (structs, tuples, arrays, ...).
The desired notion of "padding byte" for enums and unions is still unclear.

#### Place

A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr].
Expand All @@ -200,6 +218,7 @@ Values are ephemeral; they arise during the computation of an instruction but ar
(This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.)

#### Representation (relation)
[representation relation]: #representation-relation

A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory.

Expand All @@ -208,7 +227,7 @@ The term "relation" here is used in the mathematical sense: the representation r

The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation).
It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`).
For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a padding byte so changing it does not affect the value represented by a list of bytes).
For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes).

See the [value domain][value-domain] for an example how values and representation relations can be made more precise.

Expand Down
8 changes: 7 additions & 1 deletion wip/value-domain.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_

The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it.

### `Pad`

The [hypothetical `Pad` type][pad] relates the empty tuple `Tuple([])` with any byte list of length 1.

[pad]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding

### `!`

The value relation for the `!` type is empty: nothing is related to anything at this type.
Expand Down Expand Up @@ -121,7 +127,7 @@ We can implement `TypedMemory` for any `Memory` as follows:
In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy".
Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`.
In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior!
This also means that for types that have padding, the "typed copy" does not preserve the padding bytes.
This also means that for `Pad` and more generally for types that have padding, the "typed copy" does not preserve the padding bytes.

## Relation to validity invariant

Expand Down