Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note design constraints on hypothetical DynSized #166

Merged
merged 8 commits into from
Oct 11, 2022

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Jun 14, 2022

PR rendered

I've been meaning to write this up for a while (since the 2020-01-06 design meeting, honestly). New information brought to light in my recent irlo thread finally prompted me to collect this into one document.

This represents my best understanding of the implications and design constraints on the hypothetical DynSized trait. I've done my best to stay neutral and objective here.

cc @Ericson2314 (pushing for DynSized as an implementation detail)

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 14, 2022

One additional note w.r.t. Mutex that seemed a bit too nonobjective to include in the main text:

Mutex<T> could support T: ?Sized + ?MetaSized + ?DynSized if Mutex<T>: DynSized => T: MetaSized. Locking on size_of_val_raw is likely not possible with the constraint that size_of_val_raw should work on a dropped pointee. (However, if no Mutex uses an implementation that requires deinitialization of the actual raw mutex lock itself (e.g. a boxed OS mutex), then it might be possible.)

@Ericson2314
Copy link

Glad this exists!

I think a

struct WithLayout<T: ?DynSized> {
    layout: Layout,
    data: T,
};

impl<T: ?DynSized> DynSized for WithLayout<T> { ... }

might also be possible to help odd-ball types use Ref<T>-like types.

The CString restrictions I am not too worried about because even without interior mutability if one modifies them one could end up deallocating too much or too little. An restricted form of them where one cannot write any null bytes or overwrite the the original null byte (So something like &mut RestrictedThinCStr -> &mut [NonZeroU8]) also can safe to use with UnsafeCell. I think we can therefore skip any MetaSized for UnsafeCell and do a MutationCannotChangeSize marker trait or even just document that restriction.

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 14, 2022

The problem I illustrated with ThinCStr exists for WithLayout as well, and even (<T as Pointee>::Metadata, T), though; it's not specific to CStr. (FWIW, I originally thought it might not be a problem since &mut CStr doesn't expose any functionality over &CStr.)

The problem is that the example fn noop_write is presumably sound for for<T: ?Sized + DynSized> &mut T. If you have &mut T, that gives you permission to nonatomically read and write back the unchanged value for the entire span of mem::size_of_val(it).

It is a fundamental assumption of UnsafeCell<T> that given &UnsafeCell<T>, you cannot safely read the bytes of T, and have to unsafely assert that you're upholding borrowing/thread-safety rules. If mem::size_of_val::<&UnsafeCell<T>> requires reading any bytes from T, this breaks the fundamental restriction that allows UnsafeCell to work.

CStr is just the worst case scenario, because it always requires reading the entire value to determine the size. For WithLayout, you could potentially do some waffling about requiring self.layout not to be written to; with CStr this would require all writes to be atomic to avoid racing with (now required atomic) the read to scan for the nul byte.

Note that it's not just sufficient to say "it only writes nonzero bytes," because a data race is UB. Naively you'd want to assume that a data race on byte-sized values would only result in reading some value that's previously been written (as byte sized writes theoretically should not be able to tear), but this is not what the memory model gives us1; a write and another memory use on a different thread, at least one of which is not atomic and which are not externally synchronized is plain UB and your program no longer has any defined semantics, anything can happen, nasal demons eating your laundry, etc.

So even if reading in size_of_val is required to be atomic, it's still possible to write to the object's memory via noop_write and cause a data race. It is unsound to expose both &UnsafeCellLike<T> and &mut T where getting the size of T requires a reading any bytes from T.

(If there's a way I can better communicate this in not quite so many words for the document itself, I'd appreciate any edits, additions, or guidance.)

Footnotes

  1. IIUC LLVM actually has an unordered atomic ordering which has this behavior of not causing UB on a data race, but just reading some bytes which were previously written (independently arbitrary for each byte). However, if the other side of the data race is unatomic and not just unordered, it's still a UB data race.

@Ericson2314
Copy link

I think if noop_write must require MetaSized, then UnsafeCell is safe to not MetaSized. mem::size_of_val::<UnsafeCell<T>> must merely not read any mutable bytes, but if we can prevent noop_write then some bytes are not subject to interior mutation I believe? That can save us.

Anyways, what this reveals to me is that ?MetaSized is the heart of fully general custom DSTs, while ?DynSized alone is nicely a more narrow thing. I would make the first version of a DynSized not allow for any ?MetaSized by forcing the information to come from the meta alone. Then ThinCString cannot be DynSized but I think that's OK: O(n) size_of_val is a footgun anyways so long as that function is implicitly called.

Later, if we are happy with DynSized (and perhaps beyond the taboo of more opt-out traits in general 🤞) we can return to the question of ?MetaSized and fully general custom DSTs. As long as we do that before DynSized becomes stable, there should be no issue.

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 14, 2022

It seems very subtle to allow size_of_val<T: ?Sized + ?MetaSized + DynSized> but require noop_write to be <T: ?Sized + MetaSized>. (I'm using fully explicit ?bounds here for clarity.) As today's ?Sized is (roughly1) ?Sized + MetaSized, this is theoretically a possible resolution, but it seems an undesirably subtle one to me. I'll add it as a potential resolution into the doc, though.

Footnotes

  1. you can accomplish metadata -> Layout by creating a dangling pointer and calling [size|align]_of_val_raw, but there is still a safety burden that the described layout size must fit in isize

@Ericson2314
Copy link

Ericson2314 commented Jun 14, 2022

@CAD97 Yeah I am not supposed we get deep into that now.

Rather, other hand if we ignore ?MetaSized entire for now and allow noop_write, it's point out that if later we do add ?MetaSized we have option of noop_write requiring MetaSized (so saying the implicit bound cannot be relaxed) and that opens some doors.

The stuff is super trickle and subtle, which is why I don't want to make it part of the much simpler DynSized initial effort. But I also don't want anyone to think DynSized is a dead end because all this tricky custom DST stuff is impossible no matter what. It is possible just not for the squeemish :).

Bottom line is politically I feel like there is a tough issues to thread where:

  • On one hand, DynSized can't be seen as a slippery slope to custom DST endless complexity for those worried about that.
  • On the other hand, DynSized sized needs to seem a viable stepping stone to custom DST endless flexibility for those that want that.

See my choice of metaphors :). Stepping stones are used to cross bodies of water which are (locally) flat. The stepping stones are necessary to cross, but unlike the slippery slope they don't force one to finish crossing once one has begun stepping on them.

@CAD97
Copy link
Contributor Author

CAD97 commented Jul 2, 2022

Backlinking this here:

rust-lang/rust#97052 proposes to add

// std::ptr
struct TypedMetadata<T: ?Sized>(pub <T as Pointee>::Metadata);
impl<T: ?Sized + Unsize<U>, U: ?Sized> CoerceUnsized<TypedMetadata<U>> for TypedMetadata<T> {}

This allows unsizing pointee metadata without access to an attached pointer. This would place another restriction on custom unsized types: that any unsizing must operate exclusively on the metadata without access to a pointer.

I can think of one possible feature that would be eliminated by this:

Consider a type Indyn<T> which is (T::Metadata, T), and &Indyn<T> is always thin. Essentially, the heap representation of ThinBox<T>. The above restriction prevents a coercion from &Indyn<dyn Trait> to &dyn Trait via CoerceUnsized; a conversion via auto[de]ref &*(_: &Indyn<dyn Trait>) is fine.

@CAD97
Copy link
Contributor Author

CAD97 commented Jul 3, 2022

Crossposting again for visibility:

That coercion (&Indyn<dyn Trait> -> &dyn Trait) could work if Indyn<dyn Trait>: Trait. It still runs into the other problems with pointee-sized DSTs rather than meta-sized DSTs (see the above design notes), but coercing &Indyn<dyn Trait> -> &dyn Trait could work by using a vtable which is implemented as roughly method(ptr, args...) = (*(ptr as *const trait_vtable_t)).method(ptr.offset_bytes(sizeof::<trait_vtable_t>()), args...)`.

Writing that out: this coercion itself actually does only need the (Indyn) type and its pointee metadata (()), and is not verboten by TypedMetadata coercions. I return to the position that any coercions that are address/pointee-sensitive are custom coercions to custom DSTs (such as avoiding the recursive vtable lookup by dynamically creating a vtable).

@CAD97
Copy link
Contributor Author

CAD97 commented Jul 13, 2022

Just discussed in passing on Zulip, custom pointer metadata will likely run into all the same issues as Box<T, A> does for making pointers no longer 100% controlled by the language. Technically this is not directly relevant, as these design notes are technically just on the opt-out default trait bound, and not on actually adding new pointee kinds beyond extern type.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great summary, thank you @CAD97

src/design_notes/dynsized_constraints.md Outdated Show resolved Hide resolved
src/design_notes/dynsized_constraints.md Outdated Show resolved Hide resolved
@nikomatsakis
Copy link
Contributor

@CAD97 can you clarify the two above points quickly? r=me in any case, so I'll merge once you do so (or if you don't by next week, I'll merge anyway...)

@CAD97
Copy link
Contributor Author

CAD97 commented Oct 4, 2022

@nikomatsakis I agree that section was confusingly written and did my best to rewrite for clarity.

Co-authored-by: Christopher Durham <cad97@cad97.com>
@nikomatsakis nikomatsakis merged commit 9b61841 into rust-lang:master Oct 11, 2022
@CAD97 CAD97 deleted the patch-1 branch October 11, 2022 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants