Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking issue for default binding modes in match (RFC 2005, match_default_bindings) #42640

Closed
5 of 7 tasks
nikomatsakis opened this issue Jun 13, 2017 · 126 comments
Closed
5 of 7 tasks
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jun 13, 2017

This is a tracking issue for the "match ergonomics using default bindings mode" RFC (rust-lang/rfcs#2005).

Status: Awaiting stabilization PR and docs PR! Mentoring instructions here.

Steps:

Unresolved questions:

@nikomatsakis nikomatsakis added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 13, 2017
@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Jun 13, 2017

I'm actually not 100% sure the best way to implement this. It seems like we will need to add adjustments to patterns, and then integrate those into the various bits of the compiler. @eddyb, have any thoughts on that? You usually have some clever ideas when it comes to this sort of thing. =)

@eddyb
Copy link
Member

eddyb commented Jun 13, 2017

We should be able to treat adjustments more uniformly now, although it'd still be a lot of plumbing.

@crazymykl
Copy link

How does this interact with slice patterns (#23121)?

@nikomatsakis
Copy link
Contributor Author

@crazymykl

My expectation would be that, when we encounter a pattern like [a, ..b], we check the default binding mode. If it is "ref", then a becomes &slice[0] and b becomes (effectively) &slice[1..].

@nikomatsakis
Copy link
Contributor Author

@eddyb do we want to use general purpose adjustments here, or something more limited (the current RFC, after all, only requires autoderef of & and &mut types). It seems like though if we can use adjustments, that'll lay us a nice foundation for the future potentially. But it may introduce a lot of questions (e.g., what does it mean to "unsize" a pattern) that are better left unasked.

@eddyb
Copy link
Member

eddyb commented Jun 15, 2017

@nikomatsakis Autoderef or autoref? If it's one bit per pattern making it separate for now is fine.

@nikomatsakis
Copy link
Contributor Author

@eddyb

Autoderef or autoref? If it's one bit per pattern making it separate for now is fine.

autoderef -- that is, where you have a pattern like Some, the type you are matching might now be &Some or &mut Some (or &&Some, etc). I think I agree, I'm inclined to introduce a PatternAdjustment struct that just includes auto-deref and go from there.

@nikomatsakis
Copy link
Contributor Author

Here is a rough and incomplete implementation plan for the match ergonomics RFC.

There are a few high-level things we have to do:

  • figure out which patterns need to an "auto-deref" -- e.g., if you have match &foo { Some(x) ... }, then we want to record on the Some(x) pattern that there is an automatic "dereference", meaning that it is equivalent to &Some(x)
  • figure out the "binding mode" for patterns, which will no longer always be explicit.

Right now, the code is setup to scrape this information directly from the "HIR" (the compiler's AST). The HIR node for a pattern is hir::Pat, of which the most interesting part is the kind field of type PatKind. If you have a pattern like Some(x), then, that would be represented as a tree:

We want to make this equivalent to &Some(ref x), which would be encoded as a:

  • Ref pattern, encoding the &
    • TupleStruct, encoding the Some
      • Binding with BindingMode of ref, encoding the ref x

We don't however have enough information to do this when we construct the HIR, since we need type checking results to do it, so we can't change the HIR itself. The way we typically handle this sort of thing then is to have the typeck encode "side tables" with auxiliary information. These tables are stored in TypeckTables and they encode all kinds of information.

In this case, I think we want two tables:

  • pat_adjustments, sort of roughly analogous to the existing adjustments table. It would have a type like NodeMap<usize>, I think.
    • The key of such a map is the "id" of the Pat -- stored in the id field of the Pat struct.
    • The value would just be a number indicating how many implicit & patterns we insert before this pattern. So, in our example, the Some pattern would wind up with a value of 1. For the other patterns, we'd probably just have no entry, meaning "none". (You could also store 0, but why waste the memory.)
  • pat_binding_modes would have type NodeMap<hir::BindingMode>. It would, for each binding pattern, indicating the actual binding mode -- this may vary from what is found in the HIR, since we may be encoding a ref and so forth. We may want to change the HIR then to have some different type, or perhaps an Option<hir::BindingMode> -- where None means that nothing was explicitly written, which is the normal case -- so as to make it more obvious that when the user writes x this does not mean a "by value" binding mode anymore.

Probably a decent first PR is just to introduce the second table (pat_binding_modes) and rewrite all the existing code to use it. Right now, code that wants to find the binding mode of a binding extracts the value straight from the HIR, as you can see in the following examples:

  • the check_match code, which enforces various sanity checks.
  • determine_pat_move_mode, which scrapes info from the HIR and also checks the type of the values being matched by value to decide if this is a copy-or-move;
  • local_binding_mode, a helper in borrowck -- this uses the "HIR Map" to lookup, given the id, the node for a given binding, and extracts its binding mode; this would be rewritten to use the new table
  • the HAIR conversion code; this is a precursor to MIR construction, which constructs a lowered form of the patterns called the HAIR. The HAIR is intended to be a copy of the HIR taking into account all of the information encoded in various side-tables, so you don't want to change the HAIR itself, just this code which creates it by scraping the HIR.

This is not a comprehensive list, but it does have the major use-sites. You can get a more comprehensive list by doing rg 'BindByValue|BindByRef', which is what I did.

OK, no time for more, but hopefully this helps somebody get started! Please leave a note if you are interested in taking this on, and feel free to ping me on IRC (nmatsakis) or gitter (nikomatsakis) with any questions (or ask in #rustc).

@nikomatsakis nikomatsakis added the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Jul 6, 2017
@tbg
Copy link
Contributor

tbg commented Jul 14, 2017

I'll look into the first step:

Probably a decent first PR is just to introduce the second table (pat_binding_modes) and rewrite all the existing code to use it.

@nikomatsakis
Copy link
Contributor Author

@tschottdorf woohoo!

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 27, 2017
tbg added a commit to tbg/rust that referenced this issue Jul 30, 2017
This PR kicks off the implementation of the [default binding modes RFC][1] by
introducing the `pat_binding_modes` typeck table mentioned in the [mentoring
instructions][2].

`pat_binding_modes` is populated in `librustc_typeck/check/_match.rs` and
used wherever the HIR would be scraped prior to this PR. Unfortunately, one
blemish, namely a two callers to `contains_explicit_ref_binding`, remains.
This will likely have to be removed when the second part of [1], the
`pat_adjustments` table, is tackled. Appropriate comments have been added.

See rust-lang#42640.

[1]: rust-lang/rfcs#2005
[2]: rust-lang#42640 (comment)
bors added a commit that referenced this issue Jul 31, 2017
…sakis

default binding modes: add pat_binding_modes

This PR kicks off the implementation of the [default binding modes RFC][1] by
introducing the `pat_binding_modes` typeck table mentioned in the [mentoring
instructions][2].

It is a WIP because I wasn't able to avoid all uses of the binding modes as
not all call sites are close enough to the typeck tables. I added marker
comments to any line matching `BindByRef|BindByValue` so that reviewers
are aware of all of them.

I will look into changing the HIR (as suggested in [2]) to not carry a
`BindingMode` unless one was explicitly specified, but this PR is good for
a first round of comments.

The actual changes are quite small and CI will fail due to overlong lines
caused by the marker comments.

See #42640.

cc @nikomatsakis

[1]: rust-lang/rfcs#2005
[2]: #42640 (comment)
@leoyvens
Copy link
Contributor

leoyvens commented May 17, 2018

I see nothing wrong with the current rules, and certainly see no motive to rollback stabilized functionality.

In fact I like the extension that makes strictly more things compile, it's a DWIM approach. The interpretation is that a & in a pattern always means by-move. If it's matching a &T, then move T. If it's matching T in ref binding mode, then move T.

@eddyb
Copy link
Member

eddyb commented May 17, 2018

@leodasvacas I agree with that sentiment, now that I've seen examples with nested references.
My idea was "interesting" in the very strict sense that it had a "bijective" sugar, but nothing more.

@rpjohnst
Copy link
Contributor

rpjohnst commented May 17, 2018

Ah, I hadn't quite realized the difference between @Boscop's and @eddyb's proposals. I agree we probably want to keep the ability to match multiple non-reference patterns without "accumulating" references. (Those extra "layers" of pointers do exist in the match-ee, but they can't be reused in the &&T, so we'd be inventing them out of thin air anyway.)

I do still think that a) we definitely want something like @Boscop's extension (which should be backwards-compatible) and that b) it interacts weirdly with the change from #46688. That is, I would like a single way to reset the binding mode that applies anywhere; not just at pre-existing references.

The confusion in #46688 comes from the fact that the RFC "hasn't quite made up its mind" on whether &s are part of the "structure" of a type- they become optional rather than invisible, or in other words the new match behavior is more like a coercion than a mode.

So here's an idea to solve both of these problems: once we're in a ref binding mode, always remove &s from the "visible structure" of a type, and let all &s in the pattern work like @Boscop's extension. That is, bindings are now "non-reference patterns" just like constructors, and & is no longer a "reference pattern" that can match against &T values because those values are no longer visible.

This gives the same result as the current implementation for #46688- matching (_, b) against &(1, &2) desguars to &(_, &ref b) rather than &(_, b), giving b: &i32; matching (_, &b) desguars to &(_, &<ref cancelled out by &>b), giving b: i32. But it also gives us the ability to handle @Boscop's #50008- matching (i, foo) against &(usize, Foo) still desguars to &(ref i, ref foo), giving i: &usize; but matching (&i, foo) desguars to &(<ref cancelled out by &>i, ref foo), giving i: usize as desired.

Or in other words, you can no longer match against actual &s, but only against the artificial &s generated by being in ref binding mode.

Edit: Actually the more I think about this version, the more it feels like 50% pedagogical change and 50% just adding @Boscop's extension. Which is great, because that means it's more likely to be compatible and also more likely to offer better error messages!

@Boscop
Copy link

Boscop commented May 17, 2018

@rpjohnst Just to clarify: You propose that all &s of the constituent's type will be undone, and then the implicit ref will add one level of &, right?
So matching (a, b) against &(1, &&2) will desugar to &(ref a, &&ref b) (which will result in a: &i32 and b: &i32), correct? (Similar with more levels of &)
So one could remove the & introduced by the implicit ref, too, by writing (a, &b).
The rule that all constituents will only have one level of & is easy to remember.

It works well for read-only (&) references because for those it makes no difference how many levels of references the type has.
But for mutable references it makes a difference, e.g.:
Matching (a, b) against &mut (1, &mut &mut 2) will desugar to &mut (ref mut a, &mut &mut ref mut b), resulting in b: &mut i32 (with that rule).
But you can do less with a &mut T than with a &mut &mut T, so in some cases, users might want to keep multiple levels of &mut.

So, if we want to preserve the ability to keep all the &mut-ref levels of the original type, we have to do it for read-only refs, too (if we want to preserve consistency between read-only refs and mutable refs).

So then we can't auto-remove all the & levels of the original type, but we can keep the current rule of match_default_bindings that's equally easy to remember ("each constituent's type will get one additional ref") but allow the user to undo each level by using & in the pattern. E.g. matching (_, &&&b) against &(1, &&2) will result in b: i32 (all & levels undone, including the implicit ref), and matching (_, &mut &mut &mut b) against &mut (1, &mut &mut 2) will also result in b: i32.

(But I think it makes sense to allow undoing a &mut by writing &. E.g. matching (_, &&&b) against &mut (1, &mut &mut 2) to get b: i32.)

@rpjohnst
Copy link
Contributor

rpjohnst commented May 17, 2018

So matching (a, b) against &(1, &&2) will desugar to &(ref a, &&ref b) (which will result in a: &i32 and b: &i32), correct? (Similar with more levels of &)

Yes, exactly. Kind of doubling down on the idea that &s aren't part of the type's structure while in ref mode. This is one place that the behavior changes, though, so we may want to consider it for a point-release.

So, if we want to preserve the ability to keep all the &mut-ref levels of the original type, we have to do it for read-only refs, too (if we want to preserve consistency between read-only refs and mutable refs).

IMO we shouldn't do this at all- the move binding mode is sufficient to control the levels of references. If you need that control, just switch back to that mode with a & before you reach the binding. Straightforwardness and consistency for & vs &mut for the ref binding modes, and full control for the move binding mode.

The idea of accumulating extra reference levels, as you propose, is basically the thing that we (mis)interpreted @eddyb's version as, and which has the problem of introducing dummy temporaries, and which is backwards-incompatible with the #46688 fix.

@Boscop
Copy link

Boscop commented May 17, 2018

@rpjohnst

The idea of accumulating extra reference levels, as you propose, is basically the thing that we (mis)interpreted @eddyb's version as, and which has the problem of introducing dummy temporaries, and which is backwards-incompatible with the #46688 fix.

Hm, which dummy temporaries would it introduce?

Wouldn't your proposal also be backwards incompatible (because currently, matching (_, b) against &(1, &2) results in b: &&i32 but afterwards, it would result in b: &i32)?

@rpjohnst
Copy link
Contributor

rpjohnst commented May 17, 2018

It introduces dummy temporaries when you have some structure in between the layers of references. If you're just matching against an &&&i32 or whatever then that's unnecessary.

And yes, my proposal is backwards-incompatible in cases like matching (_, b) against &(1, &2) or .. x .. against .. &&1 .., which is why I mentioned a point release. I'm not sure how else to simultaneously get your proposal working, fix #46688, and not give & multiple meanings.

@Centril Centril added disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 24, 2018
@Boscop
Copy link

Boscop commented Jul 19, 2018

Any update on this issue? :)

@rpjohnst
Copy link
Contributor

So I think the place we're at now is @Boscop's original proposal, summarized by @nikomatsakis in #42640 (comment) as "make &P patterns, when matching a non-reference type and in a "default binding mode" of ref, go back to a default binding mode of move."

While in a sense this gives & two meanings, as I expressed in #42640 (comment), AFAICT it's the only thing that's backwards-compatible, unlike @eddyb's and my later proposals. So it's probably the best we can do if we want to solve the problem in #50008.

I'd like to solve this---it came up again on Reddit and it's kind of a pain. Is the next step just to write an RFC for it?

@withoutboats
Copy link
Contributor

withoutboats commented Jul 26, 2018

@rpjohnst The blocker I think is that there are several possible solutions to this problem and considerable dissensus among the lang team (and certainly the community also) about what would be best:

  1. As you said, using the & sigil is one option.
  2. Another option would be to introduce a new pattern syntax, like (a, move b), instead of (a, &b).
  3. Finally, it's been argued that another, currently postponed proposal to insert dereferences to convert &T to T where T: Copy would resolve this problem in practice.

Personally, I lean toward one of the latter two options because I think the use of & with inverted meaning in patterns is, despite its conceptual elegance, very confusing for many users, who don't seem to think of the reference operator as a type constructor/destructurer per se.

@rpjohnst
Copy link
Contributor

@withoutboats Ah, I didn't realize this would be in conflict with the second two options, especially in light of #48448.

So would the next step be to get consensus on which approach to take? Do we need to stick with just one, or could we have both to round out the existing behavior?

@withoutboats
Copy link
Contributor

@rpjohnst yea, we need consensus building to move forward here. as far as i know, we don't even have consensus that we shouldn't do more than one of them.

@Boscop
Copy link

Boscop commented Jul 27, 2018

@withoutboats

  1. Finally, it's been argued that another, currently postponed proposal to insert dereferences to convert &T to T where T: Copy would resolve this problem in practice.

Always, unconditionally? But &mut T couldn't be unconditionally dereferenced/moved, because the code in the body might want to mutate the original.

(Also, wouldn't this solution break existing code (on stable) that uses match_default_bindings, where code in the body is now manually dereferencing the reference?)

  1. Another option would be to introduce a new pattern syntax, like (a, move b), instead of (a, &b).

Please don't forget, it should be possible to mark the moved-out copy as mut locally, like &(mut x) to copy the &T as a T and make it locally mutable. (Using the existing &(mut x) and &mut (mut x) syntax.)

  1. As you said, using the & sigil is one option.

I really think the best option is to use the & sigil because that's the most consistent/familiar way and what people would expect.
And people would expect the &(mut x) syntax to work with the & sigil in this case, too!
(If we don't use the & sigil, we'd also have to special-case another syntax to make the move-copied value locally mut!)

Btw, what I would like (in addition to going with the & syntax option), is being able to use & also for move-copying a &mut T (instead of having to write &mut x) (this would also allow using &(mut x) to move-copy a &mut T and make it mut in the local scope).


Personally, I lean toward one of the latter two options because I think the use of & with inverted meaning in patterns is, despite its conceptual elegance, very confusing for many users, who don't seem to think of the reference operator as a type constructor/destructurer per se.

I've talked to several people who expected it to already work like this because it also works like this when the reference wasn't created implicitly by match_default_bindings but inherent in the type before.
The reasoning difficulty for this is not the & syntax but recognizing that match_default_bindings implicitly turned the constituents into references. After that, the reader's reasoning can follow the normal rules that apply when matching a &T with a &x pattern (including the &(mut x) syntax), so if we settle on the & syntax, it only requires the reader to recognize that match_default_bindings is active here, but uses familiar and consistent syntax (unlike the other 2 proposed solutions).
(Now you may say that seeing a move keyword there would make it clearer to the reader what's going on, BUT: In many instances where match_default_bindings is used, all constituents will be left as a reference, so Rust coders have to be able to spot instances where match_default_bindings is active anyway! We always have to keep a type checker running in our head while reading/writing Rust code anyway.)

@rpjohnst
Copy link
Contributor

Having thought through this some more, this is one place in the language where I can easily see supporting both & and move like I mentioned above. We already have two "modes" and two prominent corresponding mental models- old-style patterns with "&T is an address," and ergonomics-style patterns with "&T is a less-permissive T."

So people (and, more importantly IMO, particular pieces of programs given their context) can use the one that expresses their intent more clearly. When & (and later, Box/Rc/etc) matter as part of the structure of your type, use &; when they don't and you just want to talk about permissions, use ref/move.

Does this make sense to anyone else? Or would people rather stick to just one solution? Or do people like the "coerce &T to T where T: Copy" idea better? In the last case, would such a coercion be more than something like autoderef-on-operators (e.g. in argument position), and does that really seem like something we could get consensus on (I would be really surprised)?

@nikomatsakis
Copy link
Contributor Author

This is implemented and stabilizing, but we never closed it! The only real remaining question is #44849 and we have a separate issue for that.

@Boscop
Copy link

Boscop commented Oct 23, 2019

@withoutboats @rpjohnst Any update on this?
I think if we support move for this case, we'd also need to allow it in non-match_default_bindings situations for consistency, e.g. let x = Some(&1); match x { Some(move a) => a, None => 0 } and v.iter().map(|move item| ..) and I wouldn't like that, because then we have 2 competing equivalent ways to express the same thing (which wouldn't be orthogonal)!
I also don't like the auto-move of T:Copy because some Copy types are very large, and apart from this, I think the programmer should have a choice, to leave it as a ref or to explicitly move it out.
I think the &a syntax is our best option and I don't think it would be confusing, once people understand that match_default_bindings gives references. The &a syntax behaves here just like in let x = Some(&1); match x { Some(&a) => a, None => 0 } and v.iter().map(|&item| ..).
I really don't think it'll be a problem, especially because people wouldn't accidentally use the &a syntax unless they want to move it out.

@tamuhey
Copy link
Member

tamuhey commented Oct 1, 2020

One of the examples in the RFC cannot be built:

fn main(){
    let x = &Some((3, 3));
    match x {
      // Here, each of the patterns are treated independently
      Some((x, 3)) | &Some((ref x, 5)) => {}
      _ => {}
    }
}

playground

Why?

Related: #77358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests