Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion]: match construct for pattern matching #5016

Closed
HaloFour opened this issue Sep 4, 2015 · 28 comments
Closed

[Discussion]: match construct for pattern matching #5016

HaloFour opened this issue Sep 4, 2015 · 28 comments
Assignees
Labels
Area-Language Design Discussion Resolution-Duplicate The described behavior is tracked in another issue

Comments

@HaloFour
Copy link

HaloFour commented Sep 4, 2015

It's been mentioned several times in the different pattern matching threads (#206, #2136, #4781, #4944) so I thought I'd at least get the ball rolling with a proposal to introduce a match construct which would be specifically suited for pattern matching. I more or less yoinked this syntax out of a hat and I am not married to it so feel free to suggest alternatives. I chose to follow the lambda syntax fairly closely as I think that most C# developers should be fairly familiar with the rules of writing lambda expressions and anonymous function bodies so the behavior of encountering a return within the body should not be confusing.

Syntax:

The syntax would be as follows:

match-expression:
    match ( expression ) match-block

match-block:
    match-sectionsopt

match-sections:
    match-section
    match-sections, match-section

match-section:
    pattern guard-blockopt => anonymous-function-body

guard-block:
    when ( expr )

I've attempted to adopt this grammar from the switch syntax in §8.7.2 of the C# 5.0 specification, feel free to correct.

The match statement can be used as both a statement and an expression. In the case of being used as an expression the inferred result type of each of the anonymous-function-bodys within each match-section must be inferred to the same type which is the result type of the match expression itself.

The match statement must be complete meaning that all possible patterns must be checked otherwise it will result in a compile-time error.

Differences between switch and match in statement form:

  • Identifier scope is limited to within each match-section. This applies to both identifiers use with var patterns as well as variables defined within the anonymous-function-body.
  • Each match-section is self-contained and there is no automatic fall-through. In short, it is not necessary to break out of a section.
  • It is not possible to return out of the method enclosing the match statement. The return statement instead acts as a way to break out of the current match-section.

Examples:

match as an expression:

When using match in an expression the result type of each of the expressions or anonymous function bodies must be inferred to the same type. That type cannot be void.

var area = match (shape) {
    Point(*, *) => 0,
    Line(Point(*, *), Point(*, *)) => 0,
    Square(var width) => Rectangle(width, width),
    Rectangle(var width, var height) => width * height,
    Circle(var radius) => {
        double pi = Math.PI;
        return (radius * radius * pi);
    },
    * => throw new InvalidOperationException("Unrecognized shape!")
};
match as a statement:

When using match in a statement the result type of each of the expressions or anonymous function bodies must be inferred to return void.

match (shape) {
    Point(var x, var y) => Console.WriteLine($"Shape is a point at ({x}, {y})."),
    Line(Point(var x, var y), Point(var x2, var y2)) =>
        Console.WriteLine($"Shape is a line from ({x}, {y}) to ({x2}, {y2})."),
    Square(var width) =>
        Console.WriteLine($"Shape is a square where each side is {width}."),
    Rectangle(var width, var height) =>
        Console.WriteLine($"Shape is a rectangle of {width} by {height}."),
    Circle(var radius) => {
        if (radius <= 0) {
            return; // breaks out of the match expression
        }
        Console.WriteLine($"Shape is a circle with a radius of {radius}.")
    },
    Hexagon(*) => throw new InvalidOperationException("I don't like hexagons."),
    * => Console.WriteLine("I don't know what kind of shape this is!")
};

In the case of a simple expression the inferred result type can be anything but will be discarded, similarly to how you can assign a lambda that calls a function that returns a value to an Action delegate.

StringBuilder builder = ...;

 // legal, the return type of AppendLine is discarded
Action action = () => builder.AppendLine("foo");

match (shape) {
    // legal, the return type of AppendLine is discarded 
    Point(var x, var y) => builder.AppendLine($"Point ({x}, {y})"),

    // not legal, the body cannot explicitly return a value
    Line(Point(var x, var y), Point(var x2, var y2)) => {
        return builder.AppendLine("");
    },
    * => { } // do nothing, for completeness
};

This is the equivalent switch statement:

switch (shape) {
    case Point(var x, var y):
        Console.WriteLine($"Shape is a point at ({x}, {y}).");
        break; // must explicitly deal with fall-through
    case Line(Point(var x1, var y1), Point(var x2, var y2)): // cannot reuse identifiers
        Console.WriteLine($"Shape is a line from ({x}, {y}) to ({x2}, {y2}).");
        break;
    case Square(var width):
        Console.WriteLine($"Shape is a square where each side is {width}.");
        break;
    case Rectangle(var width2, var height2):
        Console.WriteLine($"Shape is a rectangle of {width} by {height}.");
        break;
    case Circle(var radius):
        if (radius <= 0) {
            break;
        }
        Console.WriteLine($"Shape is a circle with a radius of {radius}.");
        break;
    case Hexagon(*):
        throw new InvalidOperationException("I don't like hexagons.");
    default:
        Console.WriteLine("I don't know what kind of shape this is!");
        break;
};
@vladd
Copy link

vladd commented Sep 4, 2015

I would propose to replace return; // breaks out of the match expression with break; // breaks out of the match expression. This would match the behaviour of while/for constructs. With this change, return could still be used in order to leave the enclosing function, as usually.

@MgSam
Copy link

MgSam commented Sep 5, 2015

Agree with most of this. I think this is a lot better than trying to re-appropriate the switch statement. Although it's also not clear to me on why you would disallow return within a match statement. This is a common use case for switch currently:

public DogEnum StringToDog(string s) 
{
    switch(s)
    {
        case "Lab": return Dog.Labrador;
        case "Collie": return Dog.Collie;
    }
}

Though an expression form would work just as well here, it seems worthwhile to support both use cases.

@HaloFour
Copy link
Author

HaloFour commented Sep 5, 2015

@MgSam

It's mostly to keep consistency with the expression form of match and due to the reappropriation of the lambda/anonymous-function syntax. In my opinion if the match-sections smell like lambdas then it makes sense that return is how a value is returned from that section, which leaves a gaping hole as to how you can bail out of the enclosing method. As @vladd mentioned you could bring break into the mix but then you're making a departure from the existing anonymous-function-body syntax and a lot of additional work would need to go into the specification. In my opinion, assuming that switch is also on the table for pattern matching (which is probably more likely than anything in this proposal) then you already would have the means to bail out of the enclosing method. Also, in my brief poking around at pattern matching in other languages I've not yet found a language that permits bailing from the enclosing method, but they seem to always be expressions.

@AdamSpeight2008
Copy link
Contributor

@HaloFour, I hope you don't mind if propose a different syntax for the match expression.

match-expression   ::=  match-keyword match-params
                          match-block-opening
                          match-section*
                          match-default
                        match-block-closing
match-keyword       ::= "match"
match-params        ::=   '(' match_param? ')'
                        | '(' match_param (',' match_param )+ ')'
match-param         ::= 
match-block-opening ::= '{'
match-block-closing ::= '}'
match-section       ::= match-separator match-pattern match-code-block
match-separator     ::= "|:"
match-code-block    ::= "=>" match-simple | match-complex
match-simple        ::= LineOfCode ';'
match-complex       ::= '{' LineOfCode+  LineOfCode '}'
match-pattern       ::= '(' pattern ')' match-guard?
match-patterns      ::= match-pattern (',' match-pattern )*
match-guard         ::=  "when" match-predicate
match-default       ::=  match-separator match-code-block
pattern             ::=```

`return` should be required it the `match-code-block` has multiple statements. In which case is should be enclosing in a block ` => { statement; return foo; }`
If it is just a single statement / expression then `=> foo;` the `return` will be implied.

```c#
int y = 0;
while ( predicate )
{
 var x = match( ... )
         {
           |: ... => return 42; // exits the match-expression with the return value 42.
           |: ... => break; // this escapes the surrounding while, not the match expression
           |: ... => throw new exception();
           |: ... => { 
                       // x is in scope but unitialised.
                       // ( if x is already defined it has the existing initialisation state and value.)
                       // y is in scope.
                     }
           |: ( ... as T t)
                 when ( t != null ) =>
                     {
                       // Any "introduced-as variables", t in this match-section.
                       // Are restricted in scope to this match-section.
                       // To pass a it value out you either have to return it,
                       // or pass to an existing variable in the outer scope.
                       return 0
                     };
           |: ( ... ),  ( ... ) => 1; // Different match-patterns but they implement the same match-code-block.
           |: ( ... ) when ( ... ),
              ( ... ) when ( ... ) => {
                                    // Any "introduced-as variable" in the different match-pattern must agree.
                                    // Each guard-predicate is only applicable to the match-pattern it is in.
                                    // Different patterns but the implement the same match-code-block
                                  }
           |: ( ( ... ), (...) ) when ( ... ) ==> ; // permitted 
                                                // Any "introduced-as variable" in the different match-pattern must agree.
                                                // 1. Does the pattern match?
                                                // 2. Does satisfy the guard-predicate?
                                                // 3. repeat for other patterns.
           |: => 0 // explicitly required default pattern
        }
}

Another solution to the break problem could be to borrow VB's exit ... and continue ... syntax.

select case ( ... )
{
  case ... : 
    while ( ... )
    {
      foo = match ( ... )
            {
              |: ... => exit while; // Executes code from point after the enclosing while block.
              |: ... => exit select; // Executes code from point after the enclosing select block.
              |: ... => continue while; // Execution continues at the enclosing while.
              |: ... => continue select; // Execution continues at the enclosing select.
              |: => null;
            }
    }
}

for match statement (see #5037)

@AdamSpeight2008
Copy link
Contributor

With a match-separator and match-default we could also remove some of the bracing.

Expr Simplify( Expr e )
{
return match ( e )
       |: Add(l,r) => match(l,r)
                      |: (Const(0),*) => Const(r);
                      |: (*,Const(0)) => Const(l);
                      |: (Const(a),Const(b) => Const(a+b); 
                      |: => Add(Simplify(l),Simplify(r));     
       |: Mul(l,r) => match(l,r) 
                      |: (Const(0),*), 
                         (*,Const(0)) => Const(0);
                      |: (Const(1),*) => Const(r);
                      |: (*,Const(1)) => Const(l);
                      |: (Const(a),Const(b)) => Const(a*b)
                      |: => Mul(Simplify(l),Simplify(r)); ;
       |: Sub(l,r) => match(l,r)
                      |: (Const(0),*)        => Const(-l);
                      |: (Const(a),Const(b)) => Const(a-b);
                      |: => Sub(Simplify(l),Simplify(r)) ;
       |: Div(l,r) => match(l,r)
                      |: (*,Const(0))        => ERR_DivideByZero;
                      |: (*,Const(1))        => l;
                      |: (Const(a),Const(b)) => Const(a/b);
                      |: => Div(Simplify(l),Simplify(r));
       |: => e ;
}

@HaloFour
Copy link
Author

HaloFour commented Sep 5, 2015

@AdamSpeight2008 Ooo, I forgot guard blocks.

I'm honestly not a fan of the |: syntax. It just feels out of place to me in C# and VB.NET (not that I've approached how that syntax would look). Feels like something out of Scala but even that language keeps its pattern matching syntax pretty basic by just using case. Your inspiration is Nemerle, right? I'm not very familiar with that language but they seem to treat | in pattern matching as an "or" operation, as they do with enums. I see they also terminate sections with semi-colon similarly to your examples.

@AdamSpeight2008
Copy link
Contributor

@HaloFour

  • , visually make it look like that they are part of the same match-pattern
    Just like comma is use so separate elements in an array.
  • ; would makes more sense as it marks the end of statement or expression.
  • case makes it look like it part of a switch block., and also a little "wordy" for the c-style.
    • |: makes the syntactic structure of the match expression less visually intrusive.
    • Not having a match-section delimiter (marker/separator) (to me) looks messy and cluttered

VB.net's verbose verbiage and more powerful Select Case (as compared to c# inherited c-style switch) is better suited to being extend to support pattern-matching.

Function Simplify( e As Expr ) As Expr
  Return _ 
    Select( e )
      Case Add(l,r):
        Return Select Case (l,r)
                 Case (Const(0),*): Return Const(r)
                 Case (*,Const(0)): Return Const(l)
                 Case (Const(a),Const(b)): Return Const(a+b) 
                 Case Else: Return Add(Simplify(l),Simplify(r))
               End Select
       Case Mul(l,r):
         Return Select Case (l,r) 
                  Case (Const(0),*), (*,Const(0)) 
                    Return Const(0)
                  Case (Const(1),*): Return Const(r)
                  Case (*,Const(1)): Return Const(l)
                  Case (Const(a),Const(b)): Return Const(a*b)
                  Case Else: Return Mul(Simplify(l),Simplify(r))
               End Select
       Case Sub(l,r):
         Return Select Case (l,r)
                  Case (Const(0),*): Return Const(-l)
                  Case (Const(a),Const(b)): Return Const(a-b)
                  Case Else: Return Sub(Simplify(l),Simplify(r))
                End Select
       Case Div(l,r):
         Return Select Case (l,r)
                  Case (*,Const(0)): Return ERR_DivideByZero
                  Case (*,Const(1)): Return Simplify(l)
                  Case (Const(a),Const(b)): Return Const(a/b)
                  Case Else: Return Div(Simplify(l),Simplify(r));
                End Select
       Case Else
         Return e
       End Select
  End Select
End Function

@gafter
Copy link
Member

gafter commented Sep 6, 2015

This proposal conflates statements and expressions. It gives a grammar for a statement form and then proceeds to give an example of its use as an expression. I cannot really comment until it pulls itself together.

@AdamSpeight2008
Copy link
Contributor

@gafter , which grammar?

You can always ignore the return value of a function call, so in effect makes the match-expression a match-statament. I think they are examples of usage cases for the match-section and how they could be used.
match-code-block look lambda-like so it make sense to follow the lambda code style when using them.

@gafter
Copy link
Member

gafter commented Sep 6, 2015

@AdamSpeight2008 the proposed grammar follows the section entitled "syntax" in the original post of this issue. The grammar provides support for a statement form only. Your grammar does not really make sense to me. I do not know, for example, what a LineOfCode is, and your grammar does not allow anything between any of the parentheses.

@AdamSpeight2008
Copy link
Contributor

@gafter updated the grammar.
... is just a visual place-holder for code, which if we to include would just conflate things.
Empty definitions are to show it needs to be defined.
LineOfCode is just to indicate where you put the code to execute for that match-section.
How show I express the intent? They also don't include where whitespace and newlines are allowed, so not to hide the basic syntax and grammar on this construct. It's about the overall general style and feel of it.

@HaloFour
Copy link
Author

HaloFour commented Sep 6, 2015

@gafter I'm not particularly versed in writing grammars so please excuse errors, inconsistencies or shortcomings (for the time being). I proposed my grammar basically as a starting point since there hasn't yet been any proposals specifically for an expression form of pattern matching. I am also more than willing/happy to look at the expression form and statement form separately. I think we'd be in agreement that the former is the more useful of the two anyway given that switch will likely suffice for the statement form.

I based the syntax in the original comment on how it might feel to call one of the two following methods passing a list of lambdas replacing the empty argument list with the pattern to be matched:

// expression form
public static T match<T>(object expression, params Func<T> matches);
// statement form
public static void match(object expression, params Action matches);

e.g.

string result = match(expression, 
    () => "foo",
    () => {
        return "bar";
    },
    () => throw new InvalidOperationException()
);

Good, bad or indifferent I thought it would provide a decent starting point and it should somewhat be syntax that C# developers are familiar with.

@gafter
Copy link
Member

gafter commented Sep 6, 2015

@HaloFour I believe any new compound statement form for pattern matching that does not support returning from the enclosing method is a non-starter. Similarly I do not think we would want a new expression form in which subexpressions could include control statements (especially those that interact with enclosing statements). If we did want to add support for statements in expressions, pattern matching is a very strange place to introduce them. That is why I believe that pattern matching constructs for expressions and statements must be separate. I am working on switch as the statement form, and I expect a match expression will be specified with a simple expression on each case branch.

@gafter gafter self-assigned this Sep 6, 2015
@HaloFour
Copy link
Author

HaloFour commented Sep 7, 2015

@gafter

Similarly I do not think we would want a new expression form in which subexpressions could include control statements (especially those that interact with enclosing statements).

I agree with this. It doesn't make sense to allow the middle of a match expression to return out of the enclosing method or to break out of an enclosing loop. The one control statement I think that would have value is throw.

If we did want to add support for statements in expressions, pattern matching is a very strange place to introduce them.

I can understand this. While I was borrowing from the anonymous-function-body grammar that allows assigning statements to a delegate it is a stretch to say that such statements amounts to an expression. Although I would argue that this is an implementation detail and to the developer it does very much feel like writing a multiple-statement expression. It was even mentioned in #2930 that there is room for optimization in the implementation of lambdas.

My gut reaction is that not permitting this form of multiple-statement expressions with the syntax permitted by lambdas today is unnecessarily limiting, however with guard blocks it might not be that necessary. The one really nice thing about disallowing it would be that those control statements would no longer come into play so the conversation about what return does would be moot.

@HaloFour
Copy link
Author

HaloFour commented Sep 7, 2015

Here an alternate grammar. I am thinking specifically of the use of match as an expression in this case:

match-expression:
    match ( expression ) { match-sections }

match-sections:
    match-section
    match-sections, match-section

match-section:
    pattern guard-blockopt => match-result

guard-block:
    when ( expression )

match-result:
    expression
    throws expression

An example, although it is nearly identical to the first example above:

var area = match (shape) {
    Point(*, *) => 0,
    Line(Point(*, *), Point(*, *)) => 0,
    Square(var width) => Rectangle(width, width),
    Rectangle(var width, var height) => width * height,
    Circle(var radius) => (radius * radius * Math.PI),
    * => throw new InvalidOperationException("Unrecognized shape!")
};

@orthoxerox
Copy link
Contributor

When I think about a new match statement the most important aspects I can think of are:

  1. Familiarity: it looks like existing constructs and it works like existing constructs.
  2. Brevity: it can be really short if the logic is simple.
  3. Extensibility: it doesn't become unwieldy as it grows.

Familiarity means it has to look and behave like an existing construct: if, while, using or switch. Similarity to LINQ is a possibility as well, but it is a very special case.

To explain familiarity better, none of these constructs is an expression. This means that if we want match to be an expression it has to look like an expression. None of these constructs uses a comma either, so we have to use a semicolon.

Statement form that looks familiar to an if is easy to devise:

match (*expression*)
    with (*pattern*) *statement*
    with (*pattern*) *statement*
    default *statement*
match (p)
    with (Line l)
        DrawLine(l.Start, r.End);
    with (Rectangle r) {
        DrawLine(r.TopLeft, r.TopLeft.AddX(r.Width));
        ...
    }
    default
        throw new ApplicationException();

Some additions that could improve the syntax:

  • wrapping everything is curly braces (match (p) { ... }) would disambiguate the dangling problem with nested matches.
  • making parentheses optional around patterns that do not assign the matched value of the whole expression (with Point(var x, var y) { ... }) would make it a bit easier on the eyes.

What about expressions? As I've said, most constructs that we can base match on are statements, so we have to look wider, towards expression-bodied members and lambdas.

match (*expression*) {
    with (*pattern*) => *expression*;
    with (*pattern*) => *expression*;
    default => *expression*;
}
var area = match (p) {
    with Line => 0;
    with (Rectangle r) => r.Width*r.Height;
    default
        throw new ApplicationException();
}

Have you seen what I've done? I've mixed expressions and statements! To make this safe, we need the following rule:

with clauses can accept expressions, void-statements (statements that would return void when used as a function body) and ct-statements (void-statements you can never reach the end of, you will always encounter a control-transfer statement like return, break or goto). match statements must contain at least one void-statement or ct-statement and zero expressions. match expressions must contain at least one expression, any number of ct-statements and zero void-statements.

I'll try and post a more formal grammar a bit later.

@gafter
Copy link
Member

gafter commented Sep 8, 2015

@orthoxerox I do not think we would want a new expression form in which subexpressions could include control statements (especially those that interact with enclosing statements). If we did want to add support for statements in expressions, pattern matching is a very strange place to introduce them. I am working on extending the existing switch for the statement form, and I expect a match expression will be specified with a simple expression on each case branch.

@orthoxerox
Copy link
Contributor

@gafter but throwing because no pattern matched the given value is a legitimate use case. What would you propose, use a helper method TMustMatchMatchReturnType Throw<TException, TMustMatchMatchReturnType>(string)?

@gafter
Copy link
Member

gafter commented Sep 8, 2015

I propose throw expression be reclassified as an expression (everywhere)

@leppie
Copy link
Contributor

leppie commented Sep 8, 2015

@gafter : How? What is the return type? ;p

@vladd
Copy link

vladd commented Sep 8, 2015

@leppie It doesn't actually need to have a type, the same way as return null has none.

@leppie
Copy link
Contributor

leppie commented Sep 8, 2015

@vladd: throw does not even return. At best you can incorrectly assume void.

@orthoxerox
Copy link
Contributor

@leppie: It doesn't but the compiler can interpret throw as having whatever return type is required, since it knows it won't ever be used (unless .NET gets restarts, which is rather unlikely).

@HaloFour
Copy link
Author

HaloFour commented Sep 8, 2015

@gafter Makes perfect sense. throw would be an expression that returns never (#1226).

@gafter
Copy link
Member

gafter commented Sep 8, 2015

@HaloFour I don't think we can get the never type in the timeframe for this, as it requires CLR changes. Rather I expect it will be an expression "with no type" but that can be converted implicitly to any type.

@orthoxerox
Copy link
Contributor

@gafter: what will happen if I write Foo foo = throw new Exception() or something similarly silly? Will I get an unreachable code warning?

@gafter
Copy link
Member

gafter commented Sep 8, 2015

@orthoxerox it seems like you should get some kind of diagnostic for that.

@gafter gafter changed the title [Proposal]: match construct for pattern matching [Discussion]: match construct for pattern matching Sep 8, 2015
@gafter
Copy link
Member

gafter commented Sep 11, 2015

Discussion moved to #5143

@gafter gafter closed this as completed Sep 11, 2015
@gafter gafter added the Resolution-Duplicate The described behavior is tracked in another issue label Sep 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Language Design Discussion Resolution-Duplicate The described behavior is tracked in another issue
Projects
None yet
Development

No branches or pull requests

7 participants