Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Standardize methods for leaking containers #2969

Closed
wants to merge 9 commits into from

Conversation

KodrAus
Copy link
Contributor

@KodrAus KodrAus commented Aug 6, 2020

Specifies a standard set of methods with return types and semantics for leaking containers like Box<T> and String.

Rendered

Links

@KodrAus KodrAus added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-collections Proposals about collection APIs A-raw-pointers Proposals relating to raw pointers. Libs-Tracked Libs issues that are tracked on the team's project board. labels Aug 6, 2020
@clarfonthey
Copy link
Contributor

clarfonthey commented Aug 6, 2020

A bit unsure about the addition of both leak/unleak and from/into methods. Why not just expect the user to cast a raw pointer to a NonNull themself without a dedicated method? This also makes the user consciously aware of whether they are or aren't checking for null pointers.

This wouldn't actually change whether something is safe or not, it would just make null checks explicit and forgo the need for an extra method. It would also reduce the amount of unsafe code used in implementing these methods as you don't need to duplicate the methods.

@KodrAus
Copy link
Contributor Author

KodrAus commented Aug 6, 2020

@clarfon We already have from/into methods on Box, Rc, and Arc, but only from methods for Vec and String (they have unstable into ones).

The pointers returned from the from/into methods are never null, so that just means an unnecessary unsafe block for callers to convert them. For Vec and String the best NonNull type also isn't convertible directly from the pointer, you need to go through NonNull::slice_from_raw_parts.

The goal of adding new explicit methods like from/into that are based on NonNull alongside them is to shift use away from the from/into versions and towards the NonNull ones for most cases where we can work with the contents of a container in a way that's closer to their & and &mut counterparts.

@clarfonthey
Copy link
Contributor

@KodrAus I get that, my main concern is that if the goal is to prefer NonNull methods, why set a standard to always include raw pointer methods too? If it makes the most sense, the old methods could be deprecated and we could mark the new ones as preferred instead.

If someone receives a NonNull value and stores it as a raw pointer, then the assumption is they might have null values somehow and should check for that, IMHO. Otherwise, why not keep passing around the NonNull?

@KodrAus
Copy link
Contributor Author

KodrAus commented Aug 6, 2020

if the goal is to prefer NonNull methods, why set a standard to always include raw pointer methods too? If it makes the most sense, the old methods could be deprecated and we could mark the new ones as preferred instead.

That's a fair question. I'd figured that since these from/into methods mostly already exist then it wouldn't be more cognitive load on people exploring std to find them consistently on all the container types. I saw their goal as being focused just on giving you primitive parts that can be shared across FFI and maybe more familiar to non-Rust developers that need to consume them.

We could look at deprecating from/into in favor of leak_raw().as_ptr() and unleak_raw(NonNull::new(ptr).unwrap()) (or unsafe { new_unchecked }), which I think is still a better place to land than either making some from/into methods use NonNull and others not, or not adding them at all.

I should add these to the alternatives 🙂

@clarfonthey
Copy link
Contributor

To clarify, I'm not the kind of person who'd be using these APIs so I think it's super fair to have folks weigh in on whether they'd prefer the from/into methods in addition to leak/unleak. I mostly bring this up because the goal of this RFC is to align everything with where we want to be, and "we did it this other way already" isn't a strong enough justification for doing stuff IMHO. ;)

@KodrAus
Copy link
Contributor Author

KodrAus commented Aug 16, 2020

I’ve started updating the text with some better motivating examples for leak_raw/unleak_raw vs into_raw/from_raw. I think the FFI case is compelling enough (especially for multi-value containers like Vec) to keep them, but have suggested we use docs to point users from into_raw to leak_raw unless they need FFI-safety.

@KodrAus
Copy link
Contributor Author

KodrAus commented Aug 23, 2020

One drawback of using (NonNull<[T]>, usize) for Vec<T> instead of (NonNull<T>, usize, usize) is that it makes it harder to push new elements up to the capacity.

@KodrAus
Copy link
Contributor Author

KodrAus commented Aug 28, 2020

I think what this RFC needs for the leak_raw_parts methods on types like Vec and String is some solid motivating use-cases.

@JulianKnodt
Copy link

Hi, looking through this doc it appears that there is some codification of standard methods across multiple types. Would it make sense to codify this into a trait? For example FromRaw/IntoRaw? It might not need to be approached in this RFC, but I think it might be useful to do so just to signify to a user that these are all similar operations.


For multi-value containers like `Vec<T>` and `String`, any subset of the following method pairs should be added to work with their raw representations:

- `leak`: shrink the container to its initialized length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrink the container to its initialized length

Is shrinking it first the best option here? I'd personally expect Vec::leak to not shrink the contents. I think of leak as just something like "hey this Vec you got here, never deallocate it", like std::mem::forget, except we can still access the leaked contents. Shrinking it first might involve a lot of copying, just to save some unused capacity. Without shrinking, it does what I'd expect from a leak function: it's basically a no-op other than stopping deallocation from happening.

If I want to not waste any unused capacity, I could still easily call shrink_to_fit first (or leak the .into_boxed_slice() instead).

If leak does shrink, I'd have to use unsafe code to leak it 'efficiently' (i.e. without copying/shrinking, but wasting some unused heap space).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example use-case where this inefficiency would matter? The best one I can think of is a startup latency sensitive embedded system, that might want to quickly leak a lot of data at startup. However that use-case might be better off using a separate "pointer-bump" allocator for data it wants to leak. Not possible yet. But with custom allocators it will be.

Other than that situation, I'd expect shrink_to_fit to almost always be the right choice, because the inefficiency of something done once at startup would be outweighed by the long-term gain of having a bit less memory pressure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't feel very much like Rust to hide a potentially complex operation (shrinking/reallocating) inside a function that appears to be cheap or even free (leak()). It might very well be the case that there are not many programs where it would matter much, but it'd be good if functions do what their name suggests. The name leak suggests it just leaks the memory allocation of the container. Not that it copies all the data over to another new allocation first and leaks that instead while deallocating the original allocation.

If leak() doesn't shrink, that's easily explained in the documentation, and shrink-leaking is still easy to achieve with by calling shrink_to_fit() first (or using into_boxed_*().leak()). (Or maybe a shrink_and_leak() can be added?)

If leak() doesn't just leak but also shrinks, then just leaking is hard to achieve. It'd require unsafe {} and a raw pointer.

I also think leak() should be consistent with leak_raw, which only leaks and doesn't shrink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is shrinking it first the best option here? I'd personally expect Vec::leak to not shrink the contents.

Maybe whether or not these methods drop extra capacity should be left unspecified? I think Vec::leak only shrinks as a consequence of being implemented through Vec::into_boxed_slice so that it can use Box::leak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of leak as just something like "hey this Vec you got here, never deallocate it"

Current leak implementations can return a lifetime shorter than &'static, which means at the end of its lifetime it would be legal to reconstitute the original allocation. For that one needs to know the capacity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current leak implementations can return a lifetime shorter than &'static, which means at the end of its lifetime it would be legal to reconstitute the original allocation.

As far as I know, the only reason it can return lifetimes shorter than 'static is to make it possible to leak Vec<Something<'notstatic>>. Turning a leaked Vec back into the original Vec doesn't really seem like an intended use case for leak(). into_raw_parts seems like the right function in that case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferable the term leak should only be used in a context where the intent is to leak the vector, and it should not shrink the vector.

Different versions of into_raw(_parts)/from_raw(_parts) should generally be used to decompose this types if you later one intent to recompose them.

Except that sadly due to historic reason some of the into_raw methods do not return NonNull and as such require an otherwise unnecessary unsafe block (or .unwrap()) to create a NonNull.

With regard to shrink_to_fit() we should warn with bold text in the leak documentation that reconstructing a Vec from which you only have the len and data_ptr will lead to unsafe behavior when dropping it as the Layout for alloc and dealloc do not match. That is why Vec::into_boxed_slice() does call Vec::schrink_to_fit() as else wise dropping the boxed slice would cause unsafe behavior of which the consequences are allocator defined.

Using leak() to create a pointer the the underlying slice and then passing it to a C-FFI as ptr+len was a bug I just ran into...

@KodrAus
Copy link
Contributor Author

KodrAus commented Jan 14, 2021

cc @SimonSapin you've done some work on NonNull, did you have any thoughts on trying to specify some standard method pairs for converting containers to and from NonNulls with an expectation that we'll add APIs that make them a bit nicer to work with?

@KodrAus
Copy link
Contributor Author

KodrAus commented Mar 18, 2021

I no longer have the bandwidth to carry this forward but if somebody else would like to pick it up sometime in the future then please feel free!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs A-raw-pointers Proposals relating to raw pointers. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants