Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifier syntax to type-layout.md #1614

Merged
merged 2 commits into from
Oct 12, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 101 additions & 3 deletions src/type-layout.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,38 @@
# Type Layout

r[layout]

r[layout.intro]
The layout of a type is its size, alignment, and the relative offsets of its
fields. For enums, how the discriminant is laid out and interpreted is also part
of type layout.

r[layout.guarantees]
Type layout can be changed with each compilation. Instead of trying to document
exactly what is done, we only document what is guaranteed today.

## Size and Alignment

r[layout.properties]
All values have an alignment and size.

r[layout.properties.align]
The *alignment* of a value specifies what addresses are valid to store the value
at. A value of alignment `n` must only be stored at an address that is a
multiple of n. For example, a value with an alignment of 2 must be stored at an
even address, while a value with an alignment of 1 can be stored at any address.
Alignment is measured in bytes, and must be at least 1, and always a power of 2.
The alignment of a value can be checked with the [`align_of_val`] function.

r[layout.properties.size]
The *size* of a value is the offset in bytes between successive elements in an
array with that item type including alignment padding. The size of a value is
always a multiple of its alignment. Note that some types are zero-sized; 0 is
considered a multiple of any alignment (for example, on some platforms, the type
`[u16; 0]` has size 0 and alignment 2). The size of a value can be checked with
the [`size_of_val`] function.

r[layout.properties.sized]
Types where all values have the same size and alignment, and both are known at
compile time, implement the [`Sized`] trait and can be checked with the
[`size_of`] and [`align_of`] functions. Types that are not [`Sized`] are known
Expand All @@ -34,6 +42,9 @@ the alignment of the type respectively.

## Primitive Data Layout

r[layout.primitive]

r[layout.primitive.size]
The size of most primitives is given in this table.

| Type | `size_of::<Type>()`|
Expand All @@ -49,10 +60,12 @@ The size of most primitives is given in this table.
| `f64` | 8 |
| `char` | 4 |

r[layout.primitive.size-int]
`usize` and `isize` have a size big enough to contain every address on the
target platform. For example, on a 32 bit target, this is 4 bytes, and on a 64
bit target, this is 8 bytes.

r[layout.primitive.align]
The alignment of primitives is platform-specific.
In most cases, their alignment is equal to their size, but it may be less.
In particular, `i128` and `u128` are often aligned to 4 or 8 bytes even though
Expand All @@ -61,11 +74,16 @@ aligned to 4 bytes, not 8.

## Pointers and References Layout

r[layout.pointer]

r[layout.pointer.intro]
Pointers and references have the same layout. Mutability of the pointer or
reference does not change the layout.

r[layout.pointer.thin]
Pointers to sized types have the same size and alignment as `usize`.

r[layout.pointer.unsized]
Pointers to unsized types are sized. The size and alignment is guaranteed to be
at least equal to the size and alignment of a pointer.

Expand All @@ -75,49 +93,70 @@ at least equal to the size and alignment of a pointer.

## Array Layout

r[layout.array]

An array of `[T; N]` has a size of `size_of::<T>() * N` and the same alignment
of `T`. Arrays are laid out so that the zero-based `nth` element of the array
is offset from the start of the array by `n * size_of::<T>()` bytes.

## Slice Layout

r[layout.slice]

Slices have the same layout as the section of the array they slice.

> Note: This is about the raw `[T]` type, not pointers (`&[T]`, `Box<[T]>`,
> etc.) to slices.

## `str` Layout

r[layout.str]

String slices are a UTF-8 representation of characters that have the same layout as slices of type `[u8]`.

## Tuple Layout

r[layout.tuple]

r[layout.tuple.general]
Tuples are laid out according to the [`Rust` representation][`Rust`].

r[layout.tuple.unit]
The exception to this is the unit tuple (`()`), which is guaranteed as a
zero-sized type to have a size of 0 and an alignment of 1.

## Trait Object Layout

r[layout.trait-object]

Trait objects have the same layout as the value the trait object is of.

> Note: This is about the raw trait object types, not pointers (`&dyn Trait`,
> `Box<dyn Trait>`, etc.) to trait objects.

## Closure Layout

r[kayout.closure]
chorman0773 marked this conversation as resolved.
Show resolved Hide resolved

Closures have no layout guarantees.

## Representations

r[layout.repr]

r[layout.repr.intro]
All user-defined composite types (`struct`s, `enum`s, and `union`s) have a
*representation* that specifies what the layout is for the type. The possible
representations for a type are:
*representation* that specifies what the layout is for the type.

r[layout.repr.kinds]
The possible representations for a type are:

- [`Rust`] (default)
- [`C`]
- The [primitive representations]
- [`transparent`]

r[layout.repr.attribute]
The representation of a type can be changed by applying the `repr` attribute
to it. The following example shows a struct with a `C` representation.

Expand All @@ -130,6 +169,7 @@ struct ThreeInts {
}
```

r[layout.repr.align-packed]
The alignment may be raised or lowered with the `align` and `packed` modifiers
respectively. They alter the representation specified in the attribute.
If no representation is specified, the default one is altered.
Expand Down Expand Up @@ -157,27 +197,36 @@ struct AlignedStruct {
> the same name have the same representation. For example, `Foo<Bar>` and
> `Foo<Baz>` both have the same representation.

r[layout.repr.inter-field]
The representation of a type can change the padding between fields, but does
not change the layout of the fields themselves. For example, a struct with a
`C` representation that contains a struct `Inner` with the default
representation will not change the layout of `Inner`.

### <a id="the-default-representation"></a> The `Rust` Representation

r[layout.repr.rust]

r[layout.repr.rust.intro]
The `Rust` representation is the default representation for nominal types
without a `repr` attribute. Using this representation explicitly through a
`repr` attribute is guaranteed to be the same as omitting the attribute
entirely.

r[layout.repr.rust.layout]
The only data layout guarantees made by this representation are those required
for soundness. They are:

1. The fields are properly aligned.
2. The fields do not overlap.
3. The alignment of the type is at least the maximum alignment of its fields.

r[layout.repr.rust.alignment]
Formally, the first guarantee means that the offset of any field is divisible by
that field's alignment. The second guarantee means that the fields can be
that field's alignment.

r[layout.repr.rust.field-storage]
The second guarantee means that the fields can be
ordered such that the offset plus the size of any field is less than or equal to
the offset of the next field in the ordering. The ordering does not have to be
the same as the order in which the fields are specified in the declaration of
Expand All @@ -187,10 +236,14 @@ Be aware that the second guarantee does not imply that the fields have distinct
addresses: zero-sized types may have the same address as other fields in the
same struct.

r[layout.repr.rust.unspecified]
There are no other guarantees of data layout made by this representation.

### The `C` Representation

r[layout.repr.c]

r[layout.repr.c.intro]
The `C` representation is designed for dual purposes. One purpose is for
creating types that are interoperable with the C Language. The second purpose is
to create types that you can soundly perform operations on that rely on data
Expand All @@ -199,13 +252,18 @@ layout such as reinterpreting values as a different type.
Because of this dual purpose, it is possible to create types that are not useful
for interfacing with the C programming language.

r[layout.repr.c.constraint]
This representation can be applied to structs, unions, and enums. The exception
is [zero-variant enums] for which the `C` representation is an error.

#### `#[repr(C)]` Structs

r[layout.repr.c.struct]

r[layour.repr.c.struct.align]
chorman0773 marked this conversation as resolved.
Show resolved Hide resolved
The alignment of the struct is the alignment of the most-aligned field in it.

r[layout.repr.c.struct.size-field-offset]
The size and offset of fields is determined by the following algorithm.

Start with a current offset of 0 bytes.
Expand Down Expand Up @@ -266,8 +324,13 @@ struct.size = current_offset + padding_needed_for(current_offset, struct.alignme

#### `#[repr(C)]` Unions

r[layout.repr.c.union]

r[layout.repr.c.union.intro]
A union declared with `#[repr(C)]` will have the same size and alignment as an
equivalent C union declaration in the C language for the target platform.

r[layout.repr.c.union.size-align]
The union will have a size of the maximum size of all of its fields rounded to
its alignment, and an alignment of the maximum alignment of all of its fields.
These maximums may come from different fields.
Expand Down Expand Up @@ -296,6 +359,8 @@ assert_eq!(std::mem::align_of::<SizeRoundedUp>(), 4); // From a

#### `#[repr(C)]` Field-less Enums

r[layout.repr.c.enum]

For [field-less enums], the `C` representation has the size and alignment of
the default `enum` size and alignment for the target platform's C ABI.

Expand All @@ -308,10 +373,16 @@ the default `enum` size and alignment for the target platform's C ABI.

#### `#[repr(C)]` Enums With Fields

r[layout.repr.c.adt]

r[layout.repr.c.adt.intro]
The representation of a `repr(C)` enum with fields is a `repr(C)` struct with
two fields, also called a "tagged union" in C:

r[layout.repr.c.adt.tag]
- a `repr(C)` version of the enum with all fields removed ("the tag")

r[layout.repr.c.adt.fields]
- a `repr(C)` union of `repr(C)` structs for the fields of each variant that had
them ("the payload")

Expand Down Expand Up @@ -374,24 +445,32 @@ struct MyDFields;

### Primitive representations

r[layout.repr.primitive]

r[layout.repr.primitive.intro]
The *primitive representations* are the representations with the same names as
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `u128`,
`usize`, `i8`, `i16`, `i32`, `i64`, `i128`, and `isize`.

r[layout.repr.primitive.constraint]
Primitive representations can only be applied to enumerations and have
different behavior whether the enum has fields or no fields. It is an error
for [zero-variant enums] to have a primitive representation. Combining
two primitive representations together is an error.

#### Primitive Representation of Field-less Enums

r[layout.repr.primitive.enum]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the changes here it seems to be inconsistent about whether or not a blank line is added after the rule.

I would say the predominate style seems to not include blank lines. Can we stay consistent and avoid those extra blank lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blank line here is because it's a section rule-id not a paragraph rule-id. There's no paragraph rule-id because the current text would only have an intro paragraph. This style (blank line after rules for sections, and omitting a lone .intro id) has been common throughout the PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the section rule-id go above the main header?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could probably deal with this in a subsequent round of formatting and reorganization too, maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm still not sure I agree with this, since the paragraph below does not look like an introduction to me. It seems to define the specific layout of these types of enums. I'm not going to block here, but I have opened #1651 to further discuss how these things interact.

For [field-less enums], primitive representations set the size and alignment to
be the same as the primitive type of the same name. For example, a field-less
enum with a `u8` representation can only have discriminants between 0 and 255
inclusive.

#### Primitive Representation of Enums With Fields

r[layout.repr.primitive.adt]

The representation of a primitive representation enum is a `repr(C)` union of
`repr(C)` structs for each variant with a field. The first field of each struct
in the union is the primitive representation version of the enum with all fields
Expand Down Expand Up @@ -446,6 +525,8 @@ struct MyVariantD(MyEnumDiscriminant);

#### Combining primitive representations of enums with fields and `#[repr(C)]`

r[layout.repr.primitive-c]

For enums with fields, it is also possible to combine `repr(C)` and a
primitive representation (e.g., `repr(C, u8)`). This modifies the [`repr(C)`] by
changing the representation of the discriminant enum to the chosen primitive
Expand Down Expand Up @@ -510,6 +591,9 @@ assert_eq!(std::mem::size_of::<Enum16>(), 4);

### The alignment modifiers

r[layout.repr.alignment]

r[layout.repr.alignment.intro]
The `align` and `packed` modifiers can be used to respectively raise or lower
the alignment of `struct`s and `union`s. `packed` may also alter the padding
between fields (although it will not alter the padding inside of any field).
Expand All @@ -518,28 +602,37 @@ of fields in the layout of a struct or the layout of an enum variant, although
they may be combined with representations (such as `C`) which do provide such
guarantees.

r[layout.repr.alignment.constraint-alignment]
The alignment is specified as an integer parameter in the form of
`#[repr(align(x))]` or `#[repr(packed(x))]`. The alignment value must be a
power of two from 1 up to 2<sup>29</sup>. For `packed`, if no value is given,
as in `#[repr(packed)]`, then the value is 1.

r[layout.repr.alignment.align]
For `align`, if the specified alignment is less than the alignment of the type
without the `align` modifier, then the alignment is unaffected.

r[layout.repr.alignment.packed]
For `packed`, if the specified alignment is greater than the type's alignment
without the `packed` modifier, then the alignment and layout is unaffected.

r[layout.repr.alignment.packed-fields]
The alignments of each field, for the purpose of positioning fields, is the
smaller of the specified alignment and the alignment of the field's type.

r[layout.repr.alignment.packed-padding]
Inter-field padding is guaranteed to be the minimum required in order to
satisfy each field's (possibly altered) alignment (although note that, on its
own, `packed` does not provide any guarantee about field ordering). An
important consequence of these rules is that a type with `#[repr(packed(1))]`
(or `#[repr(packed)]`) will have no inter-field padding.

r[layout.repr.alignment.constraint-exclusive]
The `align` and `packed` modifiers cannot be applied on the same type and a
`packed` type cannot transitively contain another `align`ed type. `align` and
`packed` may only be applied to the [`Rust`] and [`C`] representations.

r[layout.repr.alignment.enum]
The `align` modifier can also be applied on an `enum`.
When it is, the effect on the `enum`'s alignment is the same as if the `enum`
was wrapped in a newtype `struct` with the same `align` modifier.
Expand Down Expand Up @@ -569,11 +662,15 @@ was wrapped in a newtype `struct` with the same `align` modifier.

### The `transparent` Representation

r[layout.repr.transparent]

r[layout.repr.transparent.constraint-field]
The `transparent` representation can only be used on a [`struct`][structs]
or an [`enum`][enumerations] with a single variant that has:
- any number of fields with size 0 and alignment 1 (e.g. [`PhantomData<T>`]), and
- at most one other field.

r[layout.repr.transparent.layout-abi]
Structs and enums with this representation have the same layout and ABI
as the only non-size 0 non-alignment 1 field, if present, or unit otherwise.

Expand All @@ -582,6 +679,7 @@ a struct with the `C` representation will always have the ABI of a `C` `struct`
while, for example, a struct with the `transparent` representation with a
primitive field will have the ABI of the primitive field.

r[layout.repr.transparent.constraint-exclusive]
Because this representation delegates type layout to another type, it cannot be
used with any other representation.

Expand Down