[RFC]: Standardize methods for leaking containers #2969

KodrAus · 2020-08-06T00:11:09Z

Specifies a standard set of methods with return types and semantics for leaking containers like Box<T> and String.

Links

In Tracking issue for Box::into_raw_non_null rust#47336 the existing unstable into_raw_non_null methods were deprecated.
In Tracking issue for Vec::leak rust#62195 we recently stabilized Vec::leak based on Box::leak.
In Tracking issue for vec_into_raw_parts rust#65816 we've been discussing whether Vec::into_raw_parts should return a *mut u8 or a NonNull.

clarfonthey · 2020-08-06T00:58:32Z

A bit unsure about the addition of both leak/unleak and from/into methods. Why not just expect the user to cast a raw pointer to a NonNull themself without a dedicated method? This also makes the user consciously aware of whether they are or aren't checking for null pointers.

This wouldn't actually change whether something is safe or not, it would just make null checks explicit and forgo the need for an extra method. It would also reduce the amount of unsafe code used in implementing these methods as you don't need to duplicate the methods.

KodrAus · 2020-08-06T01:14:12Z

@clarfon We already have from/into methods on Box, Rc, and Arc, but only from methods for Vec and String (they have unstable into ones).

The pointers returned from the from/into methods are never null, so that just means an unnecessary unsafe block for callers to convert them. For Vec and String the best NonNull type also isn't convertible directly from the pointer, you need to go through NonNull::slice_from_raw_parts.

The goal of adding new explicit methods like from/into that are based on NonNull alongside them is to shift use away from the from/into versions and towards the NonNull ones for most cases where we can work with the contents of a container in a way that's closer to their & and &mut counterparts.

clarfonthey · 2020-08-06T01:18:12Z

@KodrAus I get that, my main concern is that if the goal is to prefer NonNull methods, why set a standard to always include raw pointer methods too? If it makes the most sense, the old methods could be deprecated and we could mark the new ones as preferred instead.

If someone receives a NonNull value and stores it as a raw pointer, then the assumption is they might have null values somehow and should check for that, IMHO. Otherwise, why not keep passing around the NonNull?

KodrAus · 2020-08-06T01:30:25Z

if the goal is to prefer NonNull methods, why set a standard to always include raw pointer methods too? If it makes the most sense, the old methods could be deprecated and we could mark the new ones as preferred instead.

That's a fair question. I'd figured that since these from/into methods mostly already exist then it wouldn't be more cognitive load on people exploring std to find them consistently on all the container types. I saw their goal as being focused just on giving you primitive parts that can be shared across FFI and maybe more familiar to non-Rust developers that need to consume them.

We could look at deprecating from/into in favor of leak_raw().as_ptr() and unleak_raw(NonNull::new(ptr).unwrap()) (or unsafe { new_unchecked }), which I think is still a better place to land than either making some from/into methods use NonNull and others not, or not adding them at all.

I should add these to the alternatives 🙂

clarfonthey · 2020-08-06T03:45:48Z

To clarify, I'm not the kind of person who'd be using these APIs so I think it's super fair to have folks weigh in on whether they'd prefer the from/into methods in addition to leak/unleak. I mostly bring this up because the goal of this RFC is to align everything with where we want to be, and "we did it this other way already" isn't a strong enough justification for doing stuff IMHO. ;)

KodrAus · 2020-08-16T05:14:36Z

I’ve started updating the text with some better motivating examples for leak_raw/unleak_raw vs into_raw/from_raw. I think the FFI case is compelling enough (especially for multi-value containers like Vec) to keep them, but have suggested we use docs to point users from into_raw to leak_raw unless they need FFI-safety.

KodrAus · 2020-08-23T17:33:37Z

One drawback of using (NonNull<[T]>, usize) for Vec<T> instead of (NonNull<T>, usize, usize) is that it makes it harder to push new elements up to the capacity.

KodrAus · 2020-08-28T11:12:38Z

I think what this RFC needs for the leak_raw_parts methods on types like Vec and String is some solid motivating use-cases.

JulianKnodt · 2020-08-30T19:16:23Z

Hi, looking through this doc it appears that there is some codification of standard methods across multiple types. Would it make sense to codify this into a trait? For example FromRaw/IntoRaw? It might not need to be approached in this RFC, but I think it might be useful to do so just to signify to a user that these are all similar operations.

m-ou-se · 2020-09-09T14:48:21Z

text/0000-container-leak.md

+
+For multi-value containers like `Vec<T>` and `String`, any subset of the following method pairs should be added to work with their raw representations:
+
+- `leak`: shrink the container to its initialized length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content.


shrink the container to its initialized length

Is shrinking it first the best option here? I'd personally expect Vec::leak to not shrink the contents. I think of leak as just something like "hey this Vec you got here, never deallocate it", like std::mem::forget, except we can still access the leaked contents. Shrinking it first might involve a lot of copying, just to save some unused capacity. Without shrinking, it does what I'd expect from a leak function: it's basically a no-op other than stopping deallocation from happening.

If I want to not waste any unused capacity, I could still easily call shrink_to_fit first (or leak the .into_boxed_slice() instead).

If leak does shrink, I'd have to use unsafe code to leak it 'efficiently' (i.e. without copying/shrinking, but wasting some unused heap space).

Do you have an example use-case where this inefficiency would matter? The best one I can think of is a startup latency sensitive embedded system, that might want to quickly leak a lot of data at startup. However that use-case might be better off using a separate "pointer-bump" allocator for data it wants to leak. Not possible yet. But with custom allocators it will be.

Other than that situation, I'd expect shrink_to_fit to almost always be the right choice, because the inefficiency of something done once at startup would be outweighed by the long-term gain of having a bit less memory pressure.

It doesn't feel very much like Rust to hide a potentially complex operation (shrinking/reallocating) inside a function that appears to be cheap or even free (leak()). It might very well be the case that there are not many programs where it would matter much, but it'd be good if functions do what their name suggests. The name leak suggests it just leaks the memory allocation of the container. Not that it copies all the data over to another new allocation first and leaks that instead while deallocating the original allocation.

If leak() doesn't shrink, that's easily explained in the documentation, and shrink-leaking is still easy to achieve with by calling shrink_to_fit() first (or using into_boxed_*().leak()). (Or maybe a shrink_and_leak() can be added?)

If leak() doesn't just leak but also shrinks, then just leaking is hard to achieve. It'd require unsafe {} and a raw pointer.

I also think leak() should be consistent with leak_raw, which only leaks and doesn't shrink.

Is shrinking it first the best option here? I'd personally expect Vec::leak to not shrink the contents.

Maybe whether or not these methods drop extra capacity should be left unspecified? I think Vec::leak only shrinks as a consequence of being implemented through Vec::into_boxed_slice so that it can use Box::leak.

I think of leak as just something like "hey this Vec you got here, never deallocate it"

Current leak implementations can return a lifetime shorter than &'static, which means at the end of its lifetime it would be legal to reconstitute the original allocation. For that one needs to know the capacity.

Current leak implementations can return a lifetime shorter than &'static, which means at the end of its lifetime it would be legal to reconstitute the original allocation.

As far as I know, the only reason it can return lifetimes shorter than 'static is to make it possible to leak Vec<Something<'notstatic>>. Turning a leaked Vec back into the original Vec doesn't really seem like an intended use case for leak(). into_raw_parts seems like the right function in that case.

Preferable the term leak should only be used in a context where the intent is to leak the vector, and it should not shrink the vector.

Different versions of into_raw(_parts)/from_raw(_parts) should generally be used to decompose this types if you later one intent to recompose them.

Except that sadly due to historic reason some of the into_raw methods do not return NonNull and as such require an otherwise unnecessary unsafe block (or .unwrap()) to create a NonNull.

With regard to shrink_to_fit() we should warn with bold text in the leak documentation that reconstructing a Vec from which you only have the len and data_ptr will lead to unsafe behavior when dropping it as the Layout for alloc and dealloc do not match. That is why Vec::into_boxed_slice() does call Vec::schrink_to_fit() as else wise dropping the boxed slice would cause unsafe behavior of which the consequences are allocator defined.

Using leak() to create a pointer the the underlying slice and then passing it to a C-FFI as ptr+len was a bug I just ran into...

KodrAus · 2021-01-14T03:41:09Z

cc @SimonSapin you've done some work on NonNull, did you have any thoughts on trying to specify some standard method pairs for converting containers to and from NonNulls with an expectation that we'll add APIs that make them a bit nicer to work with?

KodrAus · 2021-03-18T23:46:07Z

I no longer have the bandwidth to carry this forward but if somebody else would like to pick it up sometime in the future then please feel free!

KodrAus added 2 commits August 1, 2020 14:06

sketch out an initial leak API

95c23d0

fill in more RFC content

7e1bb16

KodrAus added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-collections Proposals about collection APIs A-raw-pointers Proposals relating to raw pointers. Libs-Tracked Libs issues that are tracked on the team's project board. labels Aug 6, 2020

add deprecation as an alternative

7dfc2c8

KodrAus added 2 commits August 16, 2020 15:07

Provide examples of when to use leak_raw vs into_raw

46bb6ef

Update 0000-container-leak.md

f38b01b

KodrAus added 2 commits August 16, 2020 22:07

Update 0000-container-leak.md

562a392

correct allocated length to initialized length

1e74b10

m-ou-se reviewed Sep 9, 2020

View reviewed changes

m-ou-se mentioned this pull request Nov 21, 2020

Add excess capacity description for Vec::leak rust-lang/rust#79257

Closed

m-ou-se mentioned this pull request Jan 2, 2021

Implement From<Box<T>> for NonNull<T>. rust-lang/rust#80611

Closed

weasel out of specifying shrinking on leak

0b0a23c

note that we don't require these methods exist

b15b876

KodrAus closed this Mar 18, 2021

yaymukund mentioned this pull request Jul 29, 2021

Documentation for Vec::leak doesn't ackowledge dropping of excess capacity rust-lang/rust#79240

Closed

mbrubeck mentioned this pull request Sep 28, 2021

Avoid allocations and copying in Vec::leak rust-lang/rust#89337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Standardize methods for leaking containers #2969

[RFC]: Standardize methods for leaking containers #2969

KodrAus commented Aug 6, 2020

clarfonthey commented Aug 6, 2020 •

edited

Loading

KodrAus commented Aug 6, 2020

clarfonthey commented Aug 6, 2020

KodrAus commented Aug 6, 2020 •

edited

Loading

clarfonthey commented Aug 6, 2020

KodrAus commented Aug 16, 2020

KodrAus commented Aug 23, 2020

KodrAus commented Aug 28, 2020

JulianKnodt commented Aug 30, 2020

m-ou-se Sep 9, 2020

petertodd Sep 14, 2020

m-ou-se Sep 29, 2020

KodrAus Sep 30, 2020

the8472 Nov 21, 2020

m-ou-se Mar 3, 2021

rustonaut Apr 28, 2021

KodrAus commented Jan 14, 2021

KodrAus commented Mar 18, 2021


		For multi-value containers like `Vec<T>` and `String`, any subset of the following method pairs should be added to work with their raw representations:

		- `leak`: shrink the container to its initialized length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content.

[RFC]: Standardize methods for leaking containers #2969

[RFC]: Standardize methods for leaking containers #2969

Conversation

KodrAus commented Aug 6, 2020

Links

clarfonthey commented Aug 6, 2020 • edited Loading

KodrAus commented Aug 6, 2020

clarfonthey commented Aug 6, 2020

KodrAus commented Aug 6, 2020 • edited Loading

clarfonthey commented Aug 6, 2020

KodrAus commented Aug 16, 2020

KodrAus commented Aug 23, 2020

KodrAus commented Aug 28, 2020

JulianKnodt commented Aug 30, 2020

m-ou-se Sep 9, 2020

Choose a reason for hiding this comment

petertodd Sep 14, 2020

Choose a reason for hiding this comment

m-ou-se Sep 29, 2020

Choose a reason for hiding this comment

KodrAus Sep 30, 2020

Choose a reason for hiding this comment

the8472 Nov 21, 2020

Choose a reason for hiding this comment

m-ou-se Mar 3, 2021

Choose a reason for hiding this comment

rustonaut Apr 28, 2021

Choose a reason for hiding this comment

KodrAus commented Jan 14, 2021

KodrAus commented Mar 18, 2021

clarfonthey commented Aug 6, 2020 •

edited

Loading

KodrAus commented Aug 6, 2020 •

edited

Loading