-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to document allocation size upper bound? #465
Comments
Don't have time for a full response right now, but note that this is also documented in |
It may not be formalized yet, but in discussing I've been using the concept of a Rust Allocated Object to refer to the thing which exists in the abstract machine, provides memory addresses which can be pointed to, and defines what pointer offsets are "inbounds." The RAO refers just to the chunk of allocated memory; remember that memory is untyped and types only exist for typed copies between memory (as far as the AM is concerned). The usual way to create a RAO is via What we do need to document though is whether creating a RAO with size > Footnotes
|
The reason I mention For more context, the place this has come up for me recently is in this PR. It adds a |
It's unquestionably a soundness requirement that values described by a Rust type (including I don't really know anywhere better to document this soundness requirement other than If the primary question here is w.r.t. soundness guarantees, it could merit a docs issue, but it's not particularly actionable for UCG let alone T-opsem. (But it doesn't hold if you don't point to an actual allocation; it's safe to construct a raw pointer to an oversized slice.) |
IMO it’s not sufficient to document it on specific APIs like size_of_val.
There are plenty of ways code might need to rely on the size property
besides specific APIs - e.g., knowing that two pointers which are both
synthesized from references into the same T aren’t more than isize bytes
apart. I’d advocate for there being a general guarantee somewhere that safe
Rust will never produce a T or &T whose size exceeds isize.
…On Thu, Sep 28, 2023 at 4:07 PM Christopher Durham ***@***.***> wrote:
It's unquestionably a *soundness* requirement that values described by a
Rust type (including ?Sized ones) have a size <= isize::MAX. Doing
size_of_val for an oversized type is documented to be UB.
I don't really know anywhere better to document this soundness requirement
other than size_of_val_raw, slice_from_raw_parts, alloc/Layout, and
perhaps the various feature(ptr_metadata) APIs. While the precise
validity requirement for *RAO* carefully managed by pointer is
technically undecided, the requirement on types/references is reasonably
well documented in all the places it could potentially be violated.
If the primary question here is w.r.t. soundness guarantees, it could
merit a docs issue, but it's not particularly actionable for UCG let alone
T-opsem.
—
Reply to this email directly, view it on GitHub
<#465 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH7ML36WGQ7OWJXNSHDI7DX4X7KVANCNFSM6AAAAAA5LRMFEQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
size_of_val seems like the natural place to put the answer to your first two questions. By saying that function will never return a value exceeding isize::MAX, you should have everything you need -- even if you don't physically call that function, you can now rely on the size of any object you hold where you could safely call that function to not exceed isize::MAX. I'm not sure where to put the answer to the third question.
In #464 we are calling it an "allocation".
I think it has to be UB; creating such an allocation (and giving Rust access to it) is a case of mutating the Rust-visible state in a way that is not possible from Rust. The Rust AM has an invariant that all allocations are at most isize::MAX in size; violating such an invariant must be immediate UB. But of course one could say that when such an allocation is created, really it's just created with size isize::MAX, and the UB occurs on the first access outside that range. |
I don't think that's sufficient because
I was originally going to put up a PR to add the following to /// # Safety
///
/// It is guaranteed that `size_of_val` will always return a value which fits in
/// an `isize`. `unsafe` code may rely on this guarantee for its soundness. Note
/// that this amounts to a guarantee that, for all types, `T`, and for all values
/// `t: &T`, `t` references a value whose size can be encoded in an `isize`. This
/// holds because, given a `t: &T`, it is always valid to call `size_of_val(t)`.
However, I realized that this argument is unsound: If |
So, we could say
Until the extern type situation is resolved, this seems the best we can do? Well actually we could do better, we could guarantee that it will panic (probably this has to be a non-unwinding panic) on extern type, and never just return nonsense. IMO that's what we should do, but currently the "panic" part hasn't been implemented yet I think. |
Unfortunately I don't think that's sufficient because there are cases where you never actually call E.g., consider this type: pub(crate) struct Ptr<'a, T: 'a + ?Sized> {
// INVARIANTS:
// - `ptr` addresses a byte range which is not longer than `isize::MAX`
// (other invariants removed for brevity)
ptr: NonNull<T>,
_lifetime: PhantomData<&'a ()>,
} It has this impl: impl<'a, T: 'a + ?Sized> From<&'a T> for Ptr<'a, T> {
fn from(t: &'a T) -> Ptr<'a, T> {
Ptr { ptr: NonNull::from(t), _lifetime: PhantomData }
}
} In order to construct an instance of Given this limitation, I think we still need a separate location to document the size maximum (unless we can make a stable promise that |
I see, fair.
The argument here should be that there simply are no memory regions larger than isize, so that does not even involve size_of_val.
But where could that be documented? The "alloc" module would make sense but that is really only about heap allocations and we are stating a fact about all allocations...
|
Yeah so I've been talking this over with @jswrenn, and the conclusion we've come to is that the thing that makes the most sense is to make it a bit validity constraint on fn foo<T: ?Sized>(t: &T) {
let ptr = NonNull::from(t);
// SAFETY: <what do we write here?>
unsafe { requires_ptr_whose_referent_size_fits_in_isize(ptr) }
} We need to be able to make an argument whose premise is We're thinking something like:
|
"referring to more than isize::MAX bytes" is basically an ill-typed statement. At least if you mean "refer to" in the sense of "there is that much dereferenceable memory behind this pointer". This is independent of whether it's a raw pointer or a reference. There just can't be a contiguous memory range larger than A |
I agree that this is true in practice, but is it guaranteed anywhere? IIUC that's exactly what we're trying to guarantee here.
Oh interesting, I'm not sure where this comes from. How is such a reference dangling? |
It's dangling because there can't be a memory range large enough for it to point to that would make it non-dangling. :)
That's what I was asking above -- where should such docs go? This property has nothing to do with references so stating it about references makes no sense. It's a property about what the Rust Abstract Machine considers an "allocated object". |
We could add it here maybe? That defines "allocated object". If we say that allocated objects have a maximal size of |
Ah gotcha :)
Yeah so I think we're actually on the same page here. I definitely agree that what we're really talking about is a property of "allocations" or "memory regions" or some similar concept. However, The Reference doesn't currently define these concepts, while it already defines the concept of a reference. Our thinking behind making this a bit validity invariant on
Ah interesting, yeah that's a good start! I'll be honest it feels weird that that's in
Obviously let me know if you think there are better words/phrases to use for concepts like "spans" and "refers to", etc. |
We have many things in the lib docs that probably should also be in the reference -- but my general assumption is that hardly anyone reads the reference, so libs docs is usually where improvements go. Ongoing examples of that are rust-lang/rust#115476 and rust-lang/rust#115577. |
Yeah that makes sense. Obviously completely orthogonal to the present discussion, but I wonder whether it'd be reasonable to put things in the Reference but then refer to them in the obvious places so they're just as discoverable.
Does this seem like reasonable language for the |
The first sentence sounds like something we could add there, yes. The part about references is a consequence of that. That seems to be better located here (but that page doesn't seem to talk about validity requirements at all currently) and/or here. (And of course it should be generalized to also apply to
Reasonable, yes. That's just less convenient to do.^^ It needs a reference PR, waiting for the submodule to be updated, and then a libs PR... and often it's also not at all clear where in the reference something would go. The standard library has these nice keyword and primitive type documentation pages, do we really want to duplicate all that in the reference? E.g. for |
Sounds good; put up a PR: rust-lang/rust#116675
Sounds good; put up a PR: rust-lang/rust#116677
Yeah, that's very understandable. I think my general worry is that, with concepts spread out across a lot of documentation, we risk losing track of what we've technically promised and thus breaking promises made in the past. The more that code authors have to language lawyer as opposed to just cite a straightforward guarantee made by documentation, the more likely that a guarantee is technically implied by the docs but not in a way that's obvious to editors of those docs or language/compiler authors. It'd be ideal if terms were formally defined where possible (where "formally" just means "a definition exists at a place that can be linked to") and uses of those terms were always linked so that broken links could be caught automatically. It sounds like we're far away from that state, though. |
Fully agreed, I wasn't claiming that what we are doing is good. |
Documenting the keywords in std's API docs was a fairly arbitrary choice. They could just as easily be documented in the reference. |
Follow-up question that's related to this discussion: do we intend to guarantee that, for a slice DST, an instance with 0 elements always has a size which fits in If we intend to guarantee this, where would be a good place to document it? I can put up a PR. |
As a note, I was performing some tests to ensure that the status quo is that 0-length slice ZSTs are always within |
I don't know if we currently guarantee that (the layout computation code is largely a black box to me), but it does sound like a sensible thing to guarantee. |
Do you have opinions on where such a thing should be documented? |
The |
[ptr] Document maximum allocation size Partially addresses rust-lang/unsafe-code-guidelines#465
[ptr] Document maximum allocation size Partially addresses rust-lang/unsafe-code-guidelines#465
Rollup merge of rust-lang#116675 - joshlf:patch-10, r=scottmcm [ptr] Document maximum allocation size Partially addresses rust-lang/unsafe-code-guidelines#465
@joshlf https://doc.rust-lang.org/nightly/std/ptr/index.html now documents that an allocation is at most isize::MAX bytes. Could you remind me again what the motivation is for further documenting in rust-lang/reference#1482 that the "minimal size" of all types is at most |
TLDR: To support Breadcrumbs (specifically look for
|
To lay it out explicitly: suppose you have a custom slice DST, and you want to find the byte offset of the unsized tail from the beginning of the struct. Unfortunately, However, you cannot simply do this with a null DST pointer, since taking an invalid However, you can't stably obtain a large-enough allocation without already having an instance, since that requires However, calling (Note that this guarantee still isn't sufficient for dynamically constructing |
Wow, what a rabbit hole.^^
What exactly is a "custom slice DST"? We don't have custom DST. Do you mean an unsized type whose unsized tail is a slice?
Since the alignment is statically known for slices, it should be fairly easy to add support for unsized fields with slice tail to
Ah, so the entire point of rust-lang/reference#1482 is to ensure that the precondition for size_of_val_raw holds? That is easier to guarantee than rust-lang/reference#1482. For instance we could just say that if the slice metadata is 0, then |
agreed that "I want to do something that should be obvious with the standard library but I can't because (stuff)" seems like a request for a library enhancement first and a justification for that enhancement second. |
Whatever you want to call it. I mean a user-defined type that is a DST with a (possibly wrapped) slice tail, not a type that is a DST with user-defined metadata. If such a type is
That is the primary purpose, from my understanding of the
Regardless of the wording, this only fixes the narrow case of determining the offset of the slice tail. With such a guarantee, one still can't soundly create |
I see, thanks. I usually call them something like "slice-tail DST" or "unsized type with a slice tail" or so. I don't know of short standard terminology for them.
To create one without unsizing coercions, you need to make layout assumptions anyway, don't you? With the length=0 guarantee, you can implement |
Indeed, padding is the problem, as I demonstrated at rust-lang/rust#69835 (comment). The layout for a DST with a slice tail looks like:
If the DST is Before The bigger issue is the final padding after the slice tail. Since Thus, one solution would be to say that the padding after the slice tail (or perhaps any DST tail) is the minimum necessary to reach the alignment of the entire DST, even if it is |
Given that you're worried about trailing padding for the non-empty-slice case, how would rust-lang/reference#1482 help? |
(Using Unsizing coercions already require Existence of after-tail padding also raises an interesting follow-up question of if/when it is valid to split For manipulation of user defined slice-tailed Where/how to communicate that guarantee I'm not sure, but that's effectively the exact guarantee needed to make manual manipulation of slice-tailed types possible. Footnotes
|
I don't think that property is even true. Consider (on x86-64) #[repr(C)]
struct MyDST(u64, u8, [u8]); This has alignment 8, and the last field is at offset 9. With length 0, the total size is 16. With length 7, the total size is still 16. |
I don't see any contradiction there. Note that I specifically built the prefix layout with size set as the offset of the tail field, not the size of the type with zero-sized tail. A zero length tail ends up as |
Oh I see, sorry. Makes sense.
That is probably right.
It does not answer my question about how the reference PR I keep citing helps.
|
It is necessary to determine the alignment of the entire DST. With From there, the alignment of the entire DST, the byte offset of the tail, and the layout of the tail's elements are sufficient to compute the size of the entire DST for any length, assuming that trailing padding is minimized. |
Okay so there's still an assumption here that the alignment doesn't change as the length goes up? Honestly I'd rather spend all this time and effort on the proper solution(s) -- |
Sorry if I've been unclear. For the offset of slice tail fields (the |
Okay, so I propose someone prepare a PR for length-0 size_of_val_raw. offset_of on slice tail fields and fallible layout computation are reasonable feature requests; it may be worth filing issues for them. |
I have opened two PRs for these steps:
I will close the issue now, since the original question in the issue title has been answered and the issue description is quite outdated at this point. Please open a new issue if there is still an open question to resolve/track. |
size_of_val_raw: for length 0 this is safe to call For motivation, see rust-lang/unsafe-code-guidelines#465, specifically around [here](rust-lang/unsafe-code-guidelines#465 (comment)). Cc `@rust-lang/opsem`
Rollup merge of rust-lang#126152 - RalfJung:size_of_val_raw, r=saethlin size_of_val_raw: for length 0 this is safe to call For motivation, see rust-lang/unsafe-code-guidelines#465, specifically around [here](rust-lang/unsafe-code-guidelines#465 (comment)). Cc `@rust-lang/opsem`
size_of_val_raw: for length 0 this is safe to call For motivation, see rust-lang/unsafe-code-guidelines#465, specifically around [here](rust-lang/unsafe-code-guidelines#465 (comment)). Cc `@rust-lang/opsem`
Previously, we needed to rely on the fact that the instance of any valid Rust type with 0 elements has a size (in number of bytes) which is not greater than `isize::MAX`. Providing this as a guarantee turned out to be controversial. [1] This was made possible by rust-lang/rust#126152. [1] rust-lang/unsafe-code-guidelines#465 (comment)
Previously, we needed to rely on the fact that the instance of any valid Rust type with 0 elements has a size (in number of bytes) which is not greater than `isize::MAX`. Providing this as a guarantee turned out to be controversial. [1] This was made possible by rust-lang/rust#126152. [1] rust-lang/unsafe-code-guidelines#465 (comment)
Previously, we needed to rely on the fact that the instance of any valid Rust type with 0 elements has a size (in number of bytes) which is not greater than `isize::MAX`. Providing this as a guarantee turned out to be controversial. [1] This was made possible by rust-lang/rust#126152. [1] rust-lang/unsafe-code-guidelines#465 (comment)
It's common knowledge that a "Rust object" or "Rust allocation" can't have a size which overflows
isize
. This is relied upon in a lot of APIs such as the raw pointeradd
method. However, the only place it seems to be documented as a general property is on the reference page for numeric types:I see a few issues with this description:
[T; N]
?Vec
doesn't have a type (at least not in its API). If it's guaranteed that&T
can't refer to an object of more thanisize
bytes, thenvec.as_slice()
can't return a reference which violates this guarantee. However, that doesn't prevent the addresses ofvec[0]
andvec[N]
from being more thanisize
bytes apart. VariousVec
APIs strongly hint that this is impossible, but none actually guarantee it, and theVec
top-level docs make no guarantee about the interactions between various APIs (such as thevec[0]
/vec[N]
problem).Based on my understanding of the current state of the language, here's a stab at a more complete set of guarantees; do these sound reasonable?
T
, givent: T
, the size oft
is guaranteed to fit in anisize
T
, givent: &T
, the size oft
's referent is guaranteed to fit in anisize
Vec
, I could see a few approaches:t: T
andt: &T
are instances of allocations. Beyond that, we leave it up to non-builtin types to document their own guarantees by making reference to the docs for "allocation".Vec
. That seems iffy; I'm not sure how you'd formally specify the set of objects that are covered by a definition like this (e.g., do we want to make guarantees about the memory backing aHashMap
?).cc @jswrenn
The text was updated successfully, but these errors were encountered: