Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: remove weak pointers #1232

Closed
wants to merge 1 commit into from
Closed

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Jul 31, 2015

Abandon support for Weak on Rc and Arc in favour of external crates that
provide the functionality. This resolves some open API questions, while making
them genuinely "pay for what you use".

Also stabilize the uniqueness stuff on them at the same time.

rendered

@Gankra Gankra changed the title remove weak pointers RFC: remove weak pointers Jul 31, 2015
@Gankra Gankra self-assigned this Jul 31, 2015
@Gankra Gankra added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jul 31, 2015
# Alternatives

Accept the tiny overhead and stabilize Weak. This involves answering the
question of what uniqueness means in the context of Weak pointers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative not mentioned: Provide both weakless and weakable Rc and Arc in rust. For example using default type parameters, Arc<T> = Arc<T, NoWeak> vs. Arc<T, AllowWeak>. This alternative is also pay for what you use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest also designing this parameter to be able to configure atomicity and ref-count size, as well, if at all desired in the future, to avoid type parameter bloat.

@nikomatsakis
Copy link
Contributor

👍 the weak support feels somewhat niche to me, though admittedly the cost is small

@SimonSapin
Copy link
Contributor

Weak pointers are essential when reference cycles are expected, for example in a tree with parent node references. This is not all use cases, but it happens.

Yes, Rc-with-Weak can be defined outside of the standard library, but so can Rc-without-Weak. What justifies inclusion in std of one but not the other? Conversely, if one is moving out, why not the other as well? “Pay for what you use” is an argument for having both, not for where they should be.

To clarify:

  • I am fine with stabilizing the status quo, Rc-with-Weak only. (We have yet to hear of a use case where the extra cost is both significant and unnecessary)
  • I am fine with having both Rc-without-Weak and Rc-with-Weak in std, either through type parameters or in separate types.
  • I am fine with having neither in std, and both on crates.io. This is tricky right now since some of relevant optimizations or features (NonZero, unsafe_no_drop_flag, CoerceUnsized, assume) are not stable yet, but it would strengthen the story of “the core language is small but powerful, you can do everything in libraries”.
  • I don’t see a reason to have Rc-without-Weak in std and Rc-with-Weak on crates.io.

The uniqueness questions can be worked out (and are independent of inclusion or not in std):

  • get_mut and make_unique need to ensure there are no weak references which could be upgraded while the &mut T borrow is outstanding, causing mutable aliasing.
  • try_unwrap does not have this issue: when the last strong reference is dropped/consumed, any remaining weak reference is invalidated as usual.
  • The above shows that both possible meanings of is_unique are useful. Perhaps both should be provided, in two boolean methods with names to be determined bikeshedded.

@metajack
Copy link

👎

Do external crates already exist for this? These are used in Servo, and we were trying to move more things to libstd types that have them (Arc<Flow> instead of FlowRef).

@alexcrichton
Copy link
Member

@metajack to clarify this RFC intends to move the support for weak pointers into an external crate on crates.io. Support for something like crates_io_rc::Rc<Trait> will not be stable for some time but it can be enabled with a cargo feature of the crate. Servo is already using nightly Rust anyway so I don't think this would harm Servo's usage of Rc or Arc at all (apart from switching crates).

@metajack
Copy link

metajack commented Aug 1, 2015

@alexcrichton I agree with @SimonSapin that if Rc is in libstd, then Weak probably should be.

However, my issues with external crates is mostly around whether they will be created and maintained. If the answer is that the Rust team is committed to maintaining those, I agree this shouldn't impact Servo all that much.

@seanmonstar
Copy link
Contributor

I must be missing something obvious, but I don't see why the weak_count is
needed at all. The WeakPtr can continue its drop check, checking if the
_ptr is null or post_drop, and remove the decrement completely.

On Fri, Jul 31, 2015, 5:39 PM Alex Crichton notifications@github.com
wrote:

@metajack https://github.com/metajack to clarify this RFC intends to
move the support for weak pointers into an external crate on crates.io.
Support for something like crates_io_rc::Rc will not be stable for
some time but it can be enabled with a cargo feature of the crate. Servo is
already using nightly Rust anyway so I don't think this would harm Servo's
usage of Rc or Arc at all (apart from switching crates).


Reply to this email directly or view it on GitHub
#1232 (comment).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

@seanmonstar you need the weak count for a weak pointer to know if there are more weak pointers -- if it thinks it's the last it will free the memory causing all other Weaks to use-after-free.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

@metajack it could certainly be a rust-lang crate.

@seanmonstar
Copy link
Contributor

@gankro couldn't the drop fn of Weak just be:

let ptr = *self._ptr;
if !(*(&ptr as *const _ as *const *const ())).is_null() &&
    ptr as *const () as usize != mem::POST_DROP_USIZE {
    if self.strong() == 0 {
        deallocate(ptr as *mut u8, size_of_val(&*ptr),
                   align_of_val(&*ptr))
    }
}

When the last Rc is dropped, self._ptr.value can be dropped, as well as the RcBox. When a Weak is dropped, it'll see that RcBox is null/dropped.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

@seanmonstar You can make arbitrarily many weak and strong pointers. Also the strong pointers need to know if there's any weak pointers when deciding whether to free.

@seanmonstar
Copy link
Contributor

@gankro I know you can make many. Once the pointer is null, the other destructors should just check is_null and end. In my example, the strong pointers shouldn't care if there are any weak. When count reaches 0, drop it on the floor. The Weak destructors will check that.

I may be wrongly assuming that a Weak might not be able to upgrade to an Rc, if all strongs are gone. If it can, I'd think that another alternative for this RFC is to make it instead that Weak has fn get(&self) -> Option<&T>, and thus removing any need for a weak count.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

@seanmonstar How does anyone know to free the RcBox?

Sorry to be clear, there's two distinct steps:
Dropping T, and Freeing the RcBox. You can't free the RcBox until all strongs and weaks are gone (because they have a pointer to it), but you have to drop T as soon as all the strongs are gone.

@seanmonstar
Copy link
Contributor

In drop of Rc, if strong count becomes 0, free RcBox.

On Fri, Jul 31, 2015, 10:12 PM Alexis Beingessner notifications@github.com
wrote:

@seanmonstar https://github.com/seanmonstar How does anyone know to
free the RcBox?


Reply to this email directly or view it on GitHub
#1232 (comment).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

@seanmonstar This will make every Weak pointer use-after-free.

conditions regardless.

Note that we have *never* received complaints about this, to my knowledge. Servo
and rustc happily use Rc without Weak. This isn't so much a "this is tragedy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Servo uses arc::Weak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servo has both some data in Arcs that may have weak references, and some other data in Arcs that never do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the RFC it sounds like rustc and Servo don't use any weak pointers and are happy like that. I just don't want that we take decisions based on incorrect information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it’s a bit misleading. But there are also some cases where we have unused weak counters where the extra memory usage hasn’t bothered us yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc uses Weak in at least one place, as a parent pointer in the rustc_resolve custom tree of Modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So stop lieing in the RFC please!

@Gankra
Copy link
Contributor Author

Gankra commented Aug 1, 2015

(seanmonstar and I hashed it out on IRC -- they failed to heed my warning about the implementation details being a bit confusing and looked at them anyway -- Filling drop and NonZero are a hell of a drug)

@dgrunwald
Copy link
Contributor

The choice of which Rc to use isn't local: The whole point of Rc is shared ownership; possibly even shared between multiple modules in a program. If any those modules needs weak references; all modules must use a weak-enabled Rc. So the only time a module really can use Rc without weak reference support is if it doesn't use them itself and is sure the references are not passed to anyone who might.

So I don't think it's a good idea to get people to use Rc without weak references by default; at least until we have a good story for writing modules that are generic over the choice of smart pointer.

@bluss
Copy link
Member

bluss commented Aug 1, 2015

@dgrunwald I think that's a reasonably convincing argument for why Weak should be opt-out, not opt-in.

@SimonSapin
Copy link
Contributor

  • The above [try_unwrap v.s. get_mut / make_unique] shows that both possible meanings of is_unique are useful. Perhaps both should be provided, in two boolean methods with names to be determined bikeshedded.

Proposal:

  • Leave is_unique unchanged. (Namely, strong_count() == 1 && weak_count() == 0.) A reference is not truly unique if there are other references, even weak. make_unique causes is_unique to be true.
  • Add Rc::can_unwrap(&Rc<T>) -> bool that says whether try_unwrap would succeed. (Namely, strong_count() == 1. Weak references should be invalidated by try_unwrap. (Can this be done race-free on Arc?)) This is “can I safely take ownership of the T value referenced by this Rc<T>?” and also “would the T value be dropped if I drop this Rc<T>?”. The docs should mention both scenarios.

(Like is_unique now, these two methods are only for Rc. On Arc, what they tell you might no longer be true by the time they return.)

@nagisa
Copy link
Member

nagisa commented Aug 2, 2015

Add Rc::can_unwrap(&Rc) -> bool that says whether try_unwrap would succeed. (Namely, strong_count() == 1. Weak references should be invalidated by try_unwrap. (Can this be done race-free on Arc?)) This is “can I safely take ownership of the T value referenced by this Rc?” and also “would the T value be dropped if I drop this Rc?”. The docs should mention both scenarios.

What the Result/Option in return value cannot do that can_unwrap can? Sounds pretty low-gain addition for all warts can_unwrap has.

@SimonSapin
Copy link
Contributor

What the Result/Option in return value cannot do that can_unwrap can? Sounds pretty low-gain addition for all warts can_unwrap has.

try_unwrap is destructive. In this code for example, I want to know whether dropping this Rc<T> would drop T without actually dropping or unwrapping it yet.

@milibopp
Copy link

milibopp commented Aug 2, 2015

Funny coincidence, I just opened an issue on stabilizing Weak, when @bluss pointed to this RFC. As mentioned in rust-lang/rust#27465 I use Arc with Weak in a project (as an implementation detail, not as part of the API). Not using Weak is not an option for that project.

I even tried to reimplement it outside std to bridge the phase until its stabilization, which I considered to be only a matter of time. And at that point the building blocks required weren't stable either. It appears to me that rebuilding Weak outside std is not even possible without duplicating a significant chunk of code from std.

Given all that the idea presented in this RFC does not feel right, especially as it is being considered for philosophical reasons. Altogether that sounds like premature optimization. Note that you want to remove functionality that people actually use. I disagree with the dismissal of the drawbacks in the RFC. Everybody using Weak being on nightly in my case means I am on nightly precisely because there are no Weak pointers on stable.

In my opinion we should at least obtain solid data, that there is a tangible benefit to removing it, to even consider this course of action. Of course, I admit that I have a biased perspective on this as a user of Weak pointers, but from that perspective the drawbacks clearly outweigh the performance advantages.

@milibopp
Copy link

milibopp commented Aug 2, 2015

Perhaps following the argument of @dgrunwald it is inadvisable to use a ref-counted pointer without weak-ref support in APIs and such. As an alternative proposal, maybe a strong-only pointer type could be added to std as a tool for microoptimization, when you are certain, that you'll never create reference cycles or expose it to the outside world. Of course the APIs should be consistent so that swapping from one to the other is as seamless as possible.

@eddyb
Copy link
Member

eddyb commented Aug 3, 2015

I really like @bluss' parametrization alternative, allowing the std to expose both only-strong references and pairs of strong/weak reference types, without violating DRY or zero-cost principles.

@bluss
Copy link
Member

bluss commented Aug 3, 2015

Crates on crates.io that use Weak<T> according to a quick grep:

  • bacon_rajan_cc-0.1.0
  • carboxyl-0.1.1
  • crust-0.2.3
  • glr-parser-0.0.1
  • gstreamer-0.2.5
  • html5ever-0.2.1
  • html5ever_dom_sink-0.2.0
  • mm_video-0.1.5
  • oil-0.1.0
  • rclist-0.0.1
  • rctree-0.1.1
  • shared_slice-0.0.4
  • thread_isolated-0.1.0

@brson
Copy link
Contributor

brson commented Aug 3, 2015

I agree that some evidence of the negative impact on space and perf would be very good to have. My feeling is that the RC we have is violating the pay-for-what-you-use principle.

I'd like to remind people that when weak pointers were originally proposed they were a completely separate RC type, but we merged them because we didn't want a proliferation of RC types. I think this was a mistake we made in the spirit of compromise and we should not have merged weak pointers at all.

@huonw
Copy link
Member

huonw commented Aug 3, 2015

On the topic of pay-for-what-you-use: C++ is generally regarded as the poster-child for this, and yet shared_ptr supports weak pointers unconditionally (AFAIK). This doesn't mean it is right for Rc/Arc to have them, but it is evidence that it's not necessarily problematic.

btw, what motivates this bias to move things out to crates.io?

Crates.io is much, much more flexible: (breaking) changes can be made independently of the language version, unlike things in std, e.g. there can be a crate foo with versions 1.0.0 and 2.0.0 which have different APIs, but both run against Rust 1.0 and, in fact, both can be maintained (its easy to continue to release 1.y.z versions even while the 2.y.z releases are the latest ones). If the functionality was in std, one can only make direct breaking changes with major version bumps to the language.

@steveklabnik
Copy link
Member

shared_ptr also has to use atomics always, so that overhead is going to be more than the extra count would be, no? I guess what I mean is that shared_ptr already has lots of overhead.

@huonw
Copy link
Member

huonw commented Aug 3, 2015

Maybe we should only consider that factor on Arc, then.

@eddyb
Copy link
Member

eddyb commented Aug 3, 2015

@huonw C++ also has no non-atomic ref-counted pointers, it's in a really bad spot, actually, when it comes to pay-for-what-you-use.
We could do better. The extra costs are possibly negligible for Rc, but less so for Arc.

EDIT: it appears that @steveklabnik made the same point while I was writing my comment.

@huonw
Copy link
Member

huonw commented Aug 3, 2015

it's in a really bad spot, actually, when it comes to pay-for-what-you-use.

This is kinda my point ;) C++'s core version of this abstraction isn't so great, and yet it is still the only version offered. Sure, other people will write their own weak-less/non-atomic versions, but apparently not enough to get into the standard (maybe tracking down the C++ standards committee's/boost's discussion/proposals for shared_ptr would be informative for this RFC).

@ghost
Copy link

ghost commented Aug 3, 2015

I liked the suggestion by @bluss (and @eddyb's reply) so I scrambled together a quick half implementation that seems to work.

https://play.rust-lang.org/?gist=0d53766a9a224fe27fe4&version=nightly

Basically RcBox gets a generic reference count, rather than being set at 2 usize values.

I don't really understand atomic operations though, and testing for race conditions is hard, so it's probably full of holes. Not sure of the overhead in calling closures either. All of the tests from Rc and Arc passed (other than the functions I didn't implement) fwiw.

I kind of like that it's just one type this way but it seems to cause issues with type inference (e.g. let x = Rc::new(5u32) wouldn't work, but let x: Rc<u32> = Rc::new(5) did).

The RefCount trait is pretty awful though.

@eddyb
Copy link
Member

eddyb commented Aug 4, 2015

@whataloadofwhat The issues with inference will be solved once #213 is ungated (if you enable the feature, you should be able to have let x = Rc::new(5u32) working right now).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 4, 2015

I still don't understand how the trait version is supposed to be a better api than 4 types. It's the exact same combinatoric shuffling as having a function take an enum, but at the type level. It will just make everything more complicated and require describing everything 6 times instead of 4.

@bluss
Copy link
Member

bluss commented Aug 4, 2015

@gankro It should simplify implementation by avoiding copy-pasting code. The API result should be exactly the same or better, otherwise it's not working correctly. The better part would be simpler ways to switch from one configuration to another.

@pythonesque
Copy link
Contributor

It's hardly shocking that people aren't using Weak<T> given that it's not stabilized. I'm not in favor of this change; it's rare that I want to use Rc at all, but when I did I would almost certainly want Weak as well.

@nikomatsakis
Copy link
Contributor

My opinion is starting to waver. There seems to be a balance between "pay for what you use" and "full featured". I think C++ goes too far with shared_ptr in that atomic reference counting is worth calling out explicitly, but it may be that weak pointers are cheap enough that they fall under the "full featured" category.

It's true that I've never personally had much use for weak pointers, but I know that Cocoa programs make use of them to represent pointers "up the tree" (and also for some specific situations, like a backpointer from delegate to UI node). We know Servo is using them. Hence, I can imagine that it comes up pretty regularly when programs grow to a certain size and rely extensively on Rc. And I guess it's one of those features that, when you need it, you really need it.

I am not sure how I feel about using generics. It's a neat trick, and it seems better than having two totally distinct types, but in the end it's still two types (or 1.5, at least), and hence carries similar downsides (e.g., the point @dgrunwald made that you may not be in a good position to choose which flavor of Rc to use). Of course, it's possible for everyone to write code generically over the flavor of Rc, but that seems unlikely to occur in practice.

@Ericson2314
Copy link
Contributor

I think parameters alleviates part of @dgrunwald's point, because we can write generic code on such parameters today, but not with totally separate types until HKT.

@glaebhoerl
Copy link
Contributor

I agree with @aepsil0n that this feels like premature optimization. I would consider this proposal compelling only if there were benchmarks of heavy real-world users of Rc or Arc actually showing a significant performance improvement. Are there any heavy users which don't also use Weak?

@Gankra
Copy link
Contributor Author

Gankra commented Aug 8, 2015

I'm going to propose the exact opposite of this RFC (but without is_unique).

@Gankra Gankra closed this Aug 8, 2015
@nrc
Copy link
Member

nrc commented Aug 9, 2015

Sorry to comment on a closed thread, but for future reference, this really needs data - someone needs to sit down and find out the cost of the extra pointer in benchmarks before we can make an informed decision here.

@SimonSapin
Copy link
Contributor

Well, we know the cost, right? It’s size_of::<usize>() extra bytes of memory for the weak reference count per RcBox.

Whether that’s significant entirely depends on how big is T in Rc<T> and how memory constrained is a given application. And I believe that can only be anecdotal evidence.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 9, 2015

@SimonSapin it incurs extra branches/logic in some places. For instance when strong count hits 0 you need to branch on the weak count to decide if you also free the memory. Probably negligible -- but not free.

@eddyb
Copy link
Member

eddyb commented Aug 10, 2015

Like I said before, the size is the least of your worries: the main significant cost is that of atomic operations in the case of arc::Weak - I believe it's worse now, as it has been changed to use a "locking" loop of atomic operations to plug a soundness hole.

In the case of rc::Weak, I guess that LLVM might be able to remove redundant refcount increment/decrement pairs on Rc, but the extra logic for the weak count throws it off.
But you need to inspect the generated IR, before you know whether that's an actual problem (though it's not like it could elide allocations, AFAIK LLVM doesn't currently understand our heap alloc/free functions).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 10, 2015

@eddyb it understands jemalloc enough to elide some useless allocations per rust-lang/rust#22159

But rust-lang/rust#24194 suggests it's maybe not great at that.

@Storyyeller
Copy link

Was there ever any update on this? It seems like it is impossible to build a "weak-less" Rc outside of std with the same performance because the standard Rc depends on unstable features.

@comex
Copy link

comex commented Feb 2, 2017

Weak is stable now.

But I can't think of any unstable features that would prevent implementing a custom Rc. What are you referring to?

@bluss
Copy link
Member

bluss commented Feb 2, 2017

unsized coerce and access to the memory allocator is missing (The latter could be possible to overcome with just using box, but I'm not 100% certain of the details).

NonZero is missing.

@bluss
Copy link
Member

bluss commented Feb 2, 2017

Seems like you can get most of a regular weakless Rc working on stable (code link). It's missing the unsized coerce and Option<Rc<T>> size optimization, but that's not so bad.

@Storyyeller
Copy link

Given that the motivation for removing the weak count is to improve performance, I think it's counter productive to remove the nullable optimization. Giving up the rest is tolerable, but IMO, NonZero is essential.

@SimonSapin
Copy link
Contributor

I think that is an argument for NonZero outside of std rust-lang/rust#27730 (which is desirable anyway), not for weakless Rc in std.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.