Representation of tuples #12

nikomatsakis · 2018-08-30T17:16:52Z

Discussing how tuples are laid out.

Is tuple layout equivalent to some corresponding struct? If so, what struct exactly?
- that struct definition might in turn have undefined layout, of course

Some things that might be useful if they were defined:

Is it possible to inter-convert a (T, T, ..., T) tuple with a [T; N] memory representation?

The text was updated successfully, but these errors were encountered:

RalfJung · 2018-08-30T17:22:55Z

Is tuple layout equivalent to some corresponding struct?

TBH I think this is both the easiest and most expected answer. If (A, B) ever does not behave the same as

#[repr(Rust)]
struct Foo(A, B)

then something is seriously wrong.

So, I would move that question about arrays to the struct issue. ;)

hanna-kruppe · 2018-08-30T17:30:40Z

I also lean towards "tuples are just structs", but there's a subtle consequence that I want to record because it has only now occurred to me: tuples have to have the same layout in all compilation units, so if tuples and structs are laid out identically that forces us to answer this question from #11 ...

Do we ever say anything about how a #[repr(rust)] struct is laid out
(and/or treated by the ABI)?

e.g., what about different structs with same definition

across executions of the same program?

... with the relatively restrictive choice (for the compiler) that all structs' layout must only depend on the ordered list of field types, not on anything else. That might very well be what we end up deciding anyway, but it's something to consider – we basically have to choose between struct-tuple layout compatibility and more advanced (e.g. PGO-based) layout optimizations for structs.

RalfJung · 2018-08-30T17:32:56Z

with the relatively restrictive choice (for the compiler) that all structs' layout must only depend on the ordered list of field types

I do not agree.

I think of tuples as being sugar for using the following infinitely many types defined in libcore:

struct Tuple0;
struct Tuple1<T0>(pub T0);
struct Tuple2<T0, T1>(pub T0, pub T1);
// and so on

So, tuples are structs, but all tuples of the same length are instances of the same struct.

hanna-kruppe · 2018-08-30T17:35:07Z

Ok, good point, I think I misunderstood what you meant by "(A, B) [should behave] the same as struct Foo(A, B);".

nikomatsakis · 2018-08-30T17:40:35Z

So I agree that tuples should align with some Rust struct, in the manner than @RalfJung defined here.

It is plausible to imagine though that said struct might be (e.g.) #[repr(C)] -- in which case, for example, the array interconversion might work. Not arguing for this per se, just saying.

In any case I think that the main job of this thread probably is to define precisely which Rust struct tuples map to.

RalfJung · 2018-08-30T17:40:57Z

However, the array question becomes interesting then, because it'd mean that in

struct Foo<T> {
  field1: T,
  field2: T,
}

we guarantee that field1 is before field2. I do not see any harm in that, but it is something to be aware of.

ChrisJefferson · 2018-08-30T21:16:19Z

Accessing the first element of a struct through a pointer is (slightly) more efficient. I'd prefer to give that ordering option to the compiler, unless there is a clear need to put it under user control.

gnzlbg · 2018-08-30T21:31:50Z

we basically have to choose between struct-tuple layout compatibility and more advanced (e.g. PGO-based) layout optimizations for structs.

Wouldn't this just mean that if we were to use PGO to optimize layout we would have to do so that the layout of layout-compatible structs and tuples match ?

hanna-kruppe · 2018-08-30T22:07:40Z

@ChrisJefferson Keep in mind that load and store instructions have generous immediate offsets, even in the most fantaically RISC architectures. Thus extra instructions or extra cycles are usually only required if you want to take the address of a field or if further accesses to the field require an addressing mode that the architecture doesn't support or which isn't as fast (e.g. struct.some_array[variable_idx] is reg+reg+imm). CPU architects know very well that programs spend lots of time accessing struct fields (and stack slots!). Cache effects are much more likely to be relevant, but there the absolute offset is irrelevant, you just need the fields that are frequently accessed together to be close to each other.

@gnzlbg Access patterns tend to vary by struct, so different structurally-identical structs would probably have different layouts. But @RalfJung's reply resolved this anyway.

glaebhoerl · 2018-08-31T07:32:50Z

One feature which would quasi-require tuples and structs to have different representations is rust-lang/rfcs#1582. That is, with tuples it is conceivable that given &(A, B, C, D), one might want to support taking "subslices" of the tuple: &(A, B, C) (prefix) and/or &(B, C, D) (suffix) and/or &(B, C) (falls out from supporting both). But this would, AFAIU, require tuples to have more padding than their "equivalent" structs. (I wrote "quasi" earlier because in theory one could "just" add the same extra padding to structs as well, but in practice that seems out of the question.)

The logical justification for the different representations would be that tuple fields are ordered while struct fields are unordered, so just as it makes sense for 'dynamic' collections likeBTreeMap (ordered) and HashMap (unordered) to have different representations, so too would it make sense for 'static' ones like these. (That is, tuples are adding an additional requirement -- orderedness of their fields -- which normal structs do not have.)

(By "struct" here I mean "struct with named fields" throughout: tuple structs are an interesting in-between case which could conceivably go either way, but that question only becomes relevant as a follow-on.)

hanna-kruppe · 2018-08-31T09:26:05Z

The existence of tuple structs is a real thorn in the side of this reasoning, not just a follow-up: just as it would be extremely surprising for a tuple to behave differently from a tuple struct, it would be extremely surprising (and an efficency footgun) for a tuple struct to behave different from one with named fields. So far the difference is entirely stylistic, we even formalized the notion that tuple structs are just normal structs with fields names 0, 1, etc.

Combine that with other problems the representation proposed in that RFC has (the killer one for me is the need for a size/stride split which I don't believe can be introduced backwards compatibly) and I find it hard to take it seriously.

gnzlbg · 2018-08-31T11:27:10Z

Given two independent structs A and B with the same number of fields of the same types, do we want to guarantee that these have the same layout if the fields are

in the same order?
not in the same order?

If the answer is yes, then layout compatibility between structs, tuple structs, tuples, and arrays is not possible AFAICT. If the answer is no, then this is something that we are trading for being able to support layout compatibility across these constructs.

hanna-kruppe · 2018-08-31T14:24:17Z

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

ChrisJefferson · 2018-08-31T14:43:14Z

Possibly silly question: Before specifying the layout of any tuple or struct, what are the usecases for wanting to be able to know the layout for an arbitary type?

I can imagine some people want to control the layout for some specific type (to align with an existing in-memory layout of data). They can always use repr(C), or we could introduce other reprs.

Once someone is fixed, it's fixed forever. I could imagine (for example) a profiler deciding if it should represent an (int8, int8, int8) as a 3 byte structure (which makes alignment trickier) or a 4 byte structure with padding (and then where put the padding?).

hanna-kruppe · 2018-08-31T15:04:59Z

You may not always control the type you want to do layout-dependent tricks on.

Besides, in practice people make mistakes and transmute (see e.g. rust-lang/rust#50842) or type pun repr(Rust) types, all else being equal I'd rather provide a safety net for that than make it UB (especially since that's the sort of UB that miri generally can't identify).

Reordering or padding tuples, especially those of bare primitives, based on profiling information seems extremely implausible to me. They are so widely used that you likely won't have a single dominant usage pattern. It's a more interesting question for nominal types, which may be used in very specific ways, but that belongs in #11 so let's move that discussion there (and once I have time I will elaborate over there on my other reasons for being skeptical of profile-guided layout optimizations).

nikomatsakis · 2018-08-31T17:28:49Z

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

I had never thought of that.

nikomatsakis · 2018-08-31T17:29:41Z

Given two independent structs A and B with the same number of fields of the same types, do we want to guarantee that these have the same layout if the fields are

This is probably a better fit for discussion in #11 .. my guess is though that we will not =)

qnighy · 2018-09-01T08:45:30Z

TBH I think this is both the easiest and most expected answer. If (A, B) ever does not behave the same as
#[repr(Rust)]
struct Foo(A, B)
then something is seriously wrong.

It's notable that generics affect layouts here (due to unsized tuple coercion):

// unsafe w.r.t. synchronization
unsafe fn dump<T: ?Sized>(p: &T) -> Vec<u8> {
    use std::slice;
    use std::mem;
    let s = slice::from_raw_parts(p as *const T as *const u8, mem::size_of_val(p));
    s.to_vec()
}

fn main() {
    struct Foo(u16, u32);
    println!("{:?}", unsafe { dump(&(1u16, 2u32)) }); // [1, 0, 0, 0, 2, 0, 0, 0]
    println!("{:?}", unsafe { dump(&Foo(1u16, 2u32)) });  // [2, 0, 0, 0, 1, 0, 0, 0]
    // Basic rule: large align to small align
    // Exception 1: repr(C)
    // Exception 2: enum tag
    // Exception 3: the last element that can be a candidate for unsized coercion
}

However, the difference seems removable. We can just apply the "last element rule" regardless of generics.

nikomatsakis · 2018-09-10T21:57:11Z

@rkruppe

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

This is a good point. I actually find this kind of appealing. I guess I just think that being able to "index" into a tuple is potentially quite useful, and I can't think of a strong motivation for laying out homogenous tuples in any other way (particularly if that layout must apply to all tuples).

I suppose in theory one could imagine laying out some individual tuples differently than others (eg., if you saw from PGO that one particular tuple was used across threads and you wanted extra padding, and you knew somehow that references to this tuple didn't escape or get mixed with other tuples) but that seems quite over the top. I can't imagine us really ever doing that.

alercah · 2018-09-11T00:24:47Z

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

I'm a fan of this.

Bikeshed: #[repr(transparent)] on a struct with two or more fields forces layout to match that of a tuple with the same fields in the same order.

I suppose in theory one could imagine laying out some individual tuples differently than others (eg., if you saw from PGO that one particular tuple was used across threads and you wanted extra padding, and you knew somehow that references to this tuple didn't escape or get mixed with other tuples) but that seems quite over the top. I can't imagine us really ever doing that.

This will always fit under the as-if rule, I think. I do not think we should allow this to be observable by a conformant program.

ChrisJefferson · 2018-09-11T06:24:29Z

I'm still a little unclear why we want to give people the ability to specify how tuples and structs are laid out. If we want to specify it, I think something like #[repr(transparent)] is the way to go.

There are numourous papers which show how profile-guided struct reordering can be a profitable optimisation. Here's one example: https://ieeexplore.ieee.org/abstract/document/7726757/ , here's another https://pdfs.semanticscholar.org/567e/a8bf671735f120faa615ea6313faaaf07084.pdf (which also does struct splitting, which is probably not acceptable to Rust)

alercah · 2018-09-11T06:44:08Z

Follow-up bikeshed: #[repr(as(Type))] for matching a type? Fields could be matched by name or by use of an attribute such as #[repr(as(Type.field))].

Actually, perhaps we could use this for interop between two foreign types, in that two #[repr(as(T))] attributes on the same type amount to a compile-time assertion that the types have the same representation? This would at least mean that an inadvertent or accidental change of layout upstream would clearly result in a compile-time error for downstream code.

hanna-kruppe · 2018-09-12T13:27:04Z

I'll echo @gnzlbg's statement in another thread: we're really not in the business of designing new language features here.

alercah · 2018-09-12T17:12:16Z

I think we should not be in the business of designing new language features, but I think it's important to at least explore the viability of creating ones for addressing specific cases. We need to avoid a false dichotomy of "either we make this guarantee or this use case is unsupported". We can and should also consider "we can say that we don't make the guarantee, but suggest that if the use case is sufficiently important, it should have its own language feature".

Adding a guarantee where none existed is a language feature, anyway.

gnzlbg · 2018-09-12T17:14:50Z

Maybe we should split those discussions somehow ? I find the threads get hard to follow once we mix them with exploratory work that goes in different directions :(

nikomatsakis · 2018-09-25T21:50:17Z

I opened #31 which contains my attempt to summarize our tentative consensus here. Please give feedback! (In particular, let me know if there is anything you think is incorrect.)

nikomatsakis · 2018-10-11T17:48:39Z

We've called final-comment-period on #31.

nikomatsakis added the A-layout Topic: Related to data structure layout (`#[repr]`) label Aug 30, 2018

avadacatavra added A-layout Topic: Related to data structure layout (`#[repr]`) active discussion topic and removed A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 31, 2018

hanna-kruppe mentioned this issue Sep 12, 2018

Representation of structs #11

Closed

avadacatavra added the S-writeup-assigned Status: Ready for a writeup and someone is assigned to it label Sep 13, 2018

avadacatavra assigned nikomatsakis Sep 13, 2018

nikomatsakis mentioned this issue Sep 25, 2018

add a stucts-and-tuples chapter #31

Merged

nikomatsakis added the final-comment-period label Oct 11, 2018

avadacatavra closed this as completed in #31 Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of tuples #12

Representation of tuples #12

nikomatsakis commented Aug 30, 2018

RalfJung commented Aug 30, 2018 •

edited

Loading

hanna-kruppe commented Aug 30, 2018

RalfJung commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

nikomatsakis commented Aug 30, 2018 •

edited

Loading

RalfJung commented Aug 30, 2018

ChrisJefferson commented Aug 30, 2018 •

edited

Loading

gnzlbg commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

glaebhoerl commented Aug 31, 2018

hanna-kruppe commented Aug 31, 2018 •

edited

Loading

gnzlbg commented Aug 31, 2018 •

edited

Loading

hanna-kruppe commented Aug 31, 2018

ChrisJefferson commented Aug 31, 2018

hanna-kruppe commented Aug 31, 2018 •

edited

Loading

nikomatsakis commented Aug 31, 2018

nikomatsakis commented Aug 31, 2018

qnighy commented Sep 1, 2018

nikomatsakis commented Sep 10, 2018

alercah commented Sep 11, 2018

ChrisJefferson commented Sep 11, 2018

alercah commented Sep 11, 2018

hanna-kruppe commented Sep 12, 2018

alercah commented Sep 12, 2018

gnzlbg commented Sep 12, 2018

nikomatsakis commented Sep 25, 2018

nikomatsakis commented Oct 11, 2018

Representation of tuples #12

Representation of tuples #12

Comments

nikomatsakis commented Aug 30, 2018

RalfJung commented Aug 30, 2018 • edited Loading

hanna-kruppe commented Aug 30, 2018

RalfJung commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

nikomatsakis commented Aug 30, 2018 • edited Loading

RalfJung commented Aug 30, 2018

ChrisJefferson commented Aug 30, 2018 • edited Loading

gnzlbg commented Aug 30, 2018

hanna-kruppe commented Aug 30, 2018

glaebhoerl commented Aug 31, 2018

hanna-kruppe commented Aug 31, 2018 • edited Loading

gnzlbg commented Aug 31, 2018 • edited Loading

hanna-kruppe commented Aug 31, 2018

ChrisJefferson commented Aug 31, 2018

hanna-kruppe commented Aug 31, 2018 • edited Loading

nikomatsakis commented Aug 31, 2018

nikomatsakis commented Aug 31, 2018

qnighy commented Sep 1, 2018

nikomatsakis commented Sep 10, 2018

alercah commented Sep 11, 2018

ChrisJefferson commented Sep 11, 2018

alercah commented Sep 11, 2018

hanna-kruppe commented Sep 12, 2018

alercah commented Sep 12, 2018

gnzlbg commented Sep 12, 2018

nikomatsakis commented Sep 25, 2018

nikomatsakis commented Oct 11, 2018

RalfJung commented Aug 30, 2018 •

edited

Loading

nikomatsakis commented Aug 30, 2018 •

edited

Loading

ChrisJefferson commented Aug 30, 2018 •

edited

Loading

hanna-kruppe commented Aug 31, 2018 •

edited

Loading

gnzlbg commented Aug 31, 2018 •

edited

Loading

hanna-kruppe commented Aug 31, 2018 •

edited

Loading