Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of tuples #12

Closed
nikomatsakis opened this issue Aug 30, 2018 · 27 comments
Closed

Representation of tuples #12

nikomatsakis opened this issue Aug 30, 2018 · 27 comments
Assignees
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) S-writeup-assigned Status: Ready for a writeup and someone is assigned to it

Comments

@nikomatsakis
Copy link
Contributor

Discussing how tuples are laid out.

  • Is tuple layout equivalent to some corresponding struct? If so, what struct exactly?
    • that struct definition might in turn have undefined layout, of course

Some things that might be useful if they were defined:

  • Is it possible to inter-convert a (T, T, ..., T) tuple with a [T; N] memory representation?
@nikomatsakis nikomatsakis added the A-layout Topic: Related to data structure layout (`#[repr]`) label Aug 30, 2018
@RalfJung
Copy link
Member

RalfJung commented Aug 30, 2018

Is tuple layout equivalent to some corresponding struct?

TBH I think this is both the easiest and most expected answer. If (A, B) ever does not behave the same as

#[repr(Rust)]
struct Foo(A, B)

then something is seriously wrong.

So, I would move that question about arrays to the struct issue. ;)

@hanna-kruppe
Copy link

I also lean towards "tuples are just structs", but there's a subtle consequence that I want to record because it has only now occurred to me: tuples have to have the same layout in all compilation units, so if tuples and structs are laid out identically that forces us to answer this question from #11 ...

Do we ever say anything about how a #[repr(rust)] struct is laid out
(and/or treated by the ABI)?

  • e.g., what about different structs with same definition
  • across executions of the same program?

... with the relatively restrictive choice (for the compiler) that all structs' layout must only depend on the ordered list of field types, not on anything else. That might very well be what we end up deciding anyway, but it's something to consider – we basically have to choose between struct-tuple layout compatibility and more advanced (e.g. PGO-based) layout optimizations for structs.

@RalfJung
Copy link
Member

with the relatively restrictive choice (for the compiler) that all structs' layout must only depend on the ordered list of field types

I do not agree.

I think of tuples as being sugar for using the following infinitely many types defined in libcore:

struct Tuple0;
struct Tuple1<T0>(pub T0);
struct Tuple2<T0, T1>(pub T0, pub T1);
// and so on

So, tuples are structs, but all tuples of the same length are instances of the same struct.

@hanna-kruppe
Copy link

Ok, good point, I think I misunderstood what you meant by "(A, B) [should behave] the same as struct Foo(A, B);".

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Aug 30, 2018

So I agree that tuples should align with some Rust struct, in the manner than @RalfJung defined here.

It is plausible to imagine though that said struct might be (e.g.) #[repr(C)] -- in which case, for example, the array interconversion might work. Not arguing for this per se, just saying.

In any case I think that the main job of this thread probably is to define precisely which Rust struct tuples map to.

@RalfJung
Copy link
Member

However, the array question becomes interesting then, because it'd mean that in

struct Foo<T> {
  field1: T,
  field2: T,
}

we guarantee that field1 is before field2. I do not see any harm in that, but it is something to be aware of.

@ChrisJefferson
Copy link

ChrisJefferson commented Aug 30, 2018

Accessing the first element of a struct through a pointer is (slightly) more efficient. I'd prefer to give that ordering option to the compiler, unless there is a clear need to put it under user control.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 30, 2018

we basically have to choose between struct-tuple layout compatibility and more advanced (e.g. PGO-based) layout optimizations for structs.

Wouldn't this just mean that if we were to use PGO to optimize layout we would have to do so that the layout of layout-compatible structs and tuples match ?

@hanna-kruppe
Copy link

@ChrisJefferson Keep in mind that load and store instructions have generous immediate offsets, even in the most fantaically RISC architectures. Thus extra instructions or extra cycles are usually only required if you want to take the address of a field or if further accesses to the field require an addressing mode that the architecture doesn't support or which isn't as fast (e.g. struct.some_array[variable_idx] is reg+reg+imm). CPU architects know very well that programs spend lots of time accessing struct fields (and stack slots!). Cache effects are much more likely to be relevant, but there the absolute offset is irrelevant, you just need the fields that are frequently accessed together to be close to each other.

@gnzlbg Access patterns tend to vary by struct, so different structurally-identical structs would probably have different layouts. But @RalfJung's reply resolved this anyway.

@avadacatavra avadacatavra added A-layout Topic: Related to data structure layout (`#[repr]`) active discussion topic and removed A-layout Topic: Related to data structure layout (`#[repr]`) labels Aug 31, 2018
@glaebhoerl
Copy link

One feature which would quasi-require tuples and structs to have different representations is rust-lang/rfcs#1582. That is, with tuples it is conceivable that given &(A, B, C, D), one might want to support taking "subslices" of the tuple: &(A, B, C) (prefix) and/or &(B, C, D) (suffix) and/or &(B, C) (falls out from supporting both). But this would, AFAIU, require tuples to have more padding than their "equivalent" structs. (I wrote "quasi" earlier because in theory one could "just" add the same extra padding to structs as well, but in practice that seems out of the question.)

The logical justification for the different representations would be that tuple fields are ordered while struct fields are unordered, so just as it makes sense for 'dynamic' collections likeBTreeMap (ordered) and HashMap (unordered) to have different representations, so too would it make sense for 'static' ones like these. (That is, tuples are adding an additional requirement -- orderedness of their fields -- which normal structs do not have.)

(By "struct" here I mean "struct with named fields" throughout: tuple structs are an interesting in-between case which could conceivably go either way, but that question only becomes relevant as a follow-on.)

@hanna-kruppe
Copy link

hanna-kruppe commented Aug 31, 2018

The existence of tuple structs is a real thorn in the side of this reasoning, not just a follow-up: just as it would be extremely surprising for a tuple to behave differently from a tuple struct, it would be extremely surprising (and an efficency footgun) for a tuple struct to behave different from one with named fields. So far the difference is entirely stylistic, we even formalized the notion that tuple structs are just normal structs with fields names 0, 1, etc.

Combine that with other problems the representation proposed in that RFC has (the killer one for me is the need for a size/stride split which I don't believe can be introduced backwards compatibly) and I find it hard to take it seriously.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 31, 2018

Given two independent structs A and B with the same number of fields of the same types, do we want to guarantee that these have the same layout if the fields are

  • in the same order?
  • not in the same order?

If the answer is yes, then layout compatibility between structs, tuple structs, tuples, and arrays is not possible AFAICT. If the answer is no, then this is something that we are trading for being able to support layout compatibility across these constructs.

@hanna-kruppe
Copy link

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

@ChrisJefferson
Copy link

Possibly silly question: Before specifying the layout of any tuple or struct, what are the usecases for wanting to be able to know the layout for an arbitary type?

I can imagine some people want to control the layout for some specific type (to align with an existing in-memory layout of data). They can always use repr(C), or we could introduce other reprs.

Once someone is fixed, it's fixed forever. I could imagine (for example) a profiler deciding if it should represent an (int8, int8, int8) as a 3 byte structure (which makes alignment trickier) or a 4 byte structure with padding (and then where put the padding?).

@hanna-kruppe
Copy link

hanna-kruppe commented Aug 31, 2018

You may not always control the type you want to do layout-dependent tricks on.

Besides, in practice people make mistakes and transmute (see e.g. rust-lang/rust#50842) or type pun repr(Rust) types, all else being equal I'd rather provide a safety net for that than make it UB (especially since that's the sort of UB that miri generally can't identify).

Reordering or padding tuples, especially those of bare primitives, based on profiling information seems extremely implausible to me. They are so widely used that you likely won't have a single dominant usage pattern. It's a more interesting question for nominal types, which may be used in very specific ways, but that belongs in #11 so let's move that discussion there (and once I have time I will elaborate over there on my other reasons for being skeptical of profile-guided layout optimizations).

@nikomatsakis
Copy link
Contributor Author

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

I had never thought of that.

@nikomatsakis
Copy link
Contributor Author

Given two independent structs A and B with the same number of fields of the same types, do we want to guarantee that these have the same layout if the fields are

This is probably a better fit for discussion in #11 .. my guess is though that we will not =)

@qnighy
Copy link

qnighy commented Sep 1, 2018

TBH I think this is both the easiest and most expected answer. If (A, B) ever does not behave the same as

#[repr(Rust)]
struct Foo(A, B)

then something is seriously wrong.

It's notable that generics affect layouts here (due to unsized tuple coercion):

// unsafe w.r.t. synchronization
unsafe fn dump<T: ?Sized>(p: &T) -> Vec<u8> {
    use std::slice;
    use std::mem;
    let s = slice::from_raw_parts(p as *const T as *const u8, mem::size_of_val(p));
    s.to_vec()
}

fn main() {
    struct Foo(u16, u32);
    println!("{:?}", unsafe { dump(&(1u16, 2u32)) }); // [1, 0, 0, 0, 2, 0, 0, 0]
    println!("{:?}", unsafe { dump(&Foo(1u16, 2u32)) });  // [2, 0, 0, 0, 1, 0, 0, 0]
    // Basic rule: large align to small align
    // Exception 1: repr(C)
    // Exception 2: enum tag
    // Exception 3: the last element that can be a candidate for unsized coercion
}

However, the difference seems removable. We can just apply the "last element rule" regardless of generics.

@nikomatsakis
Copy link
Contributor Author

@rkruppe

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

This is a good point. I actually find this kind of appealing. I guess I just think that being able to "index" into a tuple is potentially quite useful, and I can't think of a strong motivation for laying out homogenous tuples in any other way (particularly if that layout must apply to all tuples).

I suppose in theory one could imagine laying out some individual tuples differently than others (eg., if you saw from PGO that one particular tuple was used across threads and you wanted extra padding, and you knew somehow that references to this tuple didn't escape or get mixed with other tuples) but that seems quite over the top. I can't imagine us really ever doing that.

@alercah
Copy link

alercah commented Sep 11, 2018

For the record, there's a compromise solution: special casing structs and tuples with homogeneous contents (and making them equivalent to arrays) without guaranteeing much about the layout of either tuples or structs that have heterogeneous contents.

I'm a fan of this.

Bikeshed: #[repr(transparent)] on a struct with two or more fields forces layout to match that of a tuple with the same fields in the same order.

I suppose in theory one could imagine laying out some individual tuples differently than others (eg., if you saw from PGO that one particular tuple was used across threads and you wanted extra padding, and you knew somehow that references to this tuple didn't escape or get mixed with other tuples) but that seems quite over the top. I can't imagine us really ever doing that.

This will always fit under the as-if rule, I think. I do not think we should allow this to be observable by a conformant program.

@ChrisJefferson
Copy link

I'm still a little unclear why we want to give people the ability to specify how tuples and structs are laid out. If we want to specify it, I think something like #[repr(transparent)] is the way to go.

There are numourous papers which show how profile-guided struct reordering can be a profitable optimisation. Here's one example: https://ieeexplore.ieee.org/abstract/document/7726757/ , here's another https://pdfs.semanticscholar.org/567e/a8bf671735f120faa615ea6313faaaf07084.pdf (which also does struct splitting, which is probably not acceptable to Rust)

@alercah
Copy link

alercah commented Sep 11, 2018

Follow-up bikeshed: #[repr(as(Type))] for matching a type? Fields could be matched by name or by use of an attribute such as #[repr(as(Type.field))].

Actually, perhaps we could use this for interop between two foreign types, in that two #[repr(as(T))] attributes on the same type amount to a compile-time assertion that the types have the same representation? This would at least mean that an inadvertent or accidental change of layout upstream would clearly result in a compile-time error for downstream code.

@hanna-kruppe
Copy link

I'll echo @gnzlbg's statement in another thread: we're really not in the business of designing new language features here.

@alercah
Copy link

alercah commented Sep 12, 2018

I think we should not be in the business of designing new language features, but I think it's important to at least explore the viability of creating ones for addressing specific cases. We need to avoid a false dichotomy of "either we make this guarantee or this use case is unsupported". We can and should also consider "we can say that we don't make the guarantee, but suggest that if the use case is sufficiently important, it should have its own language feature".

Adding a guarantee where none existed is a language feature, anyway.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 12, 2018

Maybe we should split those discussions somehow ? I find the threads get hard to follow once we mix them with exploratory work that goes in different directions :(

@avadacatavra avadacatavra added the S-writeup-assigned Status: Ready for a writeup and someone is assigned to it label Sep 13, 2018
@nikomatsakis
Copy link
Contributor Author

I opened #31 which contains my attempt to summarize our tentative consensus here. Please give feedback! (In particular, let me know if there is anything you think is incorrect.)

@nikomatsakis
Copy link
Contributor Author

We've called final-comment-period on #31.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Topic: Related to data structure layout (`#[repr]`) S-writeup-assigned Status: Ready for a writeup and someone is assigned to it
Projects
None yet
Development

No branches or pull requests

9 participants