Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-think function definitions #2189

Closed
lukaszcz opened this issue Jun 7, 2023 · 14 comments · Fixed by #2243
Closed

Re-think function definitions #2189

lukaszcz opened this issue Jun 7, 2023 · 14 comments · Fixed by #2243

Comments

@lukaszcz
Copy link
Collaborator

lukaszcz commented Jun 7, 2023

Our clause definition syntax is somewhat inconsistent. Sometimes we separate clauses with | (in lambdas and cases) and sometimes with ; (at the top-level). Sometimes we use braces (in lambdas) and sometimes we don't (in cases and top-level definitions). Moreover, top-level clause definition syntax is inconsistent with the type definition syntax.

This is the root of the issue:

The lack of braces in case makes writing nested cases inconvenient:

case a
| C x := (case f x
              | D := A
              | E := B)
| F := C;

The nested case needs to be put in parentheses to disambiguate.

Personally, I don't like the Haskell/Agda/Idris approach to function definitions. I don't like repeating the function name in each clause. Also, I wouldn't mind something more concise and closer to most other programming languages, where you can specify argument types next to arguments without bothering to write a separate signature.

Function definitions in Lean 4 come close to a syntax I would consider convenient and immediately understandable to someone not familiar with Haskell/Agda/Idris:

def twice (f : Nat -> Nat) (x : Nat) : Nat :=
  f (f x)

def compose (α β γ : Type) (g : β → γ) (f : α → β) (x : α) : γ :=
  g (f x)

Lean syntax naturally extends to multiple-clause pattern matching:

def length : List α → Nat
  | [] => 0
  | y :: ys => Nat.succ (length ys)

def drop : Nat → List α → List α
  | Nat.zero, xs => xs
  | _, [] => []
  | Nat.succ n, x :: xs => drop n xs

I find it ironic that while we're promoting the Haskell/Agda/Idris approach in Juvix, in our own Haskell code we're actively avoiding it. For example, we're trying to replace things like

f :: List a -> a
f [] = A
f (x : xs) = B

with

f :: List a -> a
f = \case
  [] -> A
  x : xs -> B

Proposals/ideas

  1. Add optional braces to case and mandatory parentheses around the matched expression:
case (f a b) {
| C x := A
| D y := B
}

Then the first | may be optional to make it more uniform with the lambdas and a new syntax for function clause definitions (below).

We need to add parentheses around the matched expression to avoid ambiguity with implicit arguments, but this seems natural. More precisely, we should be matching a single atom, so the following are still fine:

case x
   A := ..
 | B := ..;

case x {
   A := ..
 | B := ..
};
  1. Make braces optional in a lambda:
\ 
| C x := A
| D y := B;

Then the lambda syntax is entirely analogous to the case syntax. Except that we don't require parentheses around top patterns in case because there is only one pattern matched -- this is very convenient but breaks the uniformity slightly.

  1. Change top-level clause syntax to use | instead of repeating the function name and require all clauses for a function to be next to each other. It practically never happens anyway that the clauses of different functions are mixed (I'm not sure if we even allow it).
func
  | x (C y) := A
  | Z (D y) := B;

Like in a lambda and a type definition, the first | is optional, which is handy when there is only one clause:

func x y := A;

Now top-level clause definitions are analogous to cases, lambdas and type definitions. Note that having optional braces in case becomes more important with this change, because we now separate clauses with | and use ; to end the entire definition:

func
  | (C x) := case (f x) { A := .. | B := .. }
  | (D x) := case (g x) { A := .. | B := .. };

We could even allow optional braces for top-level clause definitions to make it completely consistent with case and lambda, but I don't see any practical use-case for it.

func {
  (C x) := A
| (D x) := B
};
  1. Allow to (optionally) merge the type signature with clause definitions.
fact : Nat -> Nat
  | zero := 1
  | n@(suc n') := n * fact n';

With the type signature, the first | cannot be optional. We can still have a special case for a single clause with no arguments:

id : Nat -> Nat := \ x := x;
  1. Allow to move some arguments to the left of : when the signature is specified together with the clauses:
id {A : Type} (x : A) : A := x;

compose {A B C : Type} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

map {A B : Type} (f : A -> B) : List A -> List B
  | nil := nil
  | (x :: xs) := f x :: map f xs;

With traits and the proposed universal quantification moved to the left:

compose !{A B C} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

func !{A B} {{Ord A; Enum B}} (f : A -> B) : A -> B
  | ..
  | ..;

listOrd !{A} {{O : Ord A}} : Ord (List A) := MkOrd with {
  compare (lst1 : List A) (lst2 : List A) : Bool := ...;
};

Note that after all these changes signatures could still be specified separately from the clauses, but presumably this would be much less common (used mostly for mutual recursion):

even : Nat -> Bool;
odd : Nat -> Bool;

even
  | zero := true
  | (suc n) := odd n;

odd
  | zero := false
  | (suc n) := even n;
@mariari
Copy link
Member

mariari commented Jun 9, 2023

Just a brief comment on a subset of the proposal. I may leave a more full comment later

Function definitions in Lean 4 come close to a syntax I would consider convenient and immediately understandable to someone not familiar with Haskell/Agda/Idris:

I would say the biggest hurdle for someone not familiar is typically the function being bare at the top level

f x = ...

people who come from other backgrounds typically do not understand what this indicates. This goes away quickly however once they do know. Signatures being separate is a thing people from an ALGOL background tend not to see too often, however other families are used to declarations being in potentially other places.

For

  1. Allow to move some arguments to the left of : when the signature is specified together with the clauses:

I would advise against this. The language has embraced text as the representation. When users want to find some definition you often have to grep through the system. Thus having a fixed syntax f : to send in a project grep is very useful to finding where a definition may live. I have to do this in rust (as rust too embraces text). Thus you should keep an identifier easy to find.

Further using | on the top level

even
  | zero := true
  | (suc n) := odd n;

reminds me a bit of SML syntax

fun map' f nil       = nil
  | map' f (x :: xs) = f x :: map' f xs

What are the plans for guards, when | is taken?

Lastly. It might be good to rethink the mutually recursive functions being done via a signature, it compounds some problems with text search and is an odd way to do it without much signposting

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 9, 2023

When users want to find some definition you often have to grep through the system. Thus having a fixed syntax f : to send in a project grep is very useful to finding where a definition may live.

I'd say that's more a problem with lack of tooling (language server) than with the syntax. If you want to optimize the syntax for definition search there are better ways than giving up on the conciseness of moving arguments to the left. For example, introducing a definition keyword, e.g.,

def f (x : Nat) : Nat := ..

Then you just search for def f. But if you need to use grep to search for definitions, the language tooling is inadequate in my opinion.

I'm very much in favour of having this small convenience of being able to move things to the left of :. Being always forced to separate things quickly becomes tedious, especially with dependent types, because you find yourself repeating the initial part of a function signature in each clause. You quite often have several arguments that are the same for all clauses and only the last or a few final arguments are matched on. By moving some arguments to the left no information is lost - you can still read off the type.

What are the plans for guards, when | is taken?

I don't know, but one can just use any keyword. OCaml uses when.

@mariari
Copy link
Member

mariari commented Jun 9, 2023

I'd say that's more a problem with lack of tooling (language server) than with the syntax

Are you meaning lacking LSP, or with LSP itself? Either way with how the Juvix system loads, I don't think it's too robust if the system is in an inconsistent state? Rust has LSP support that persists even with inconsistent typing and failure to compile, but even then I do have to grep around for certain values. Further with the approaches that are taken within the project, if the tooling fails you should consider fall back strategies and users trying to get work done will likely try to apply some of these. I do agree that these are due to more sub-optimal tooling

You are correct that def f is better for that

@mariari
Copy link
Member

mariari commented Jun 9, 2023

For guards, when in how ocaml does it is not very good. Though if you put a | spin on it like your if syntax, then it may be OK. Best to not steal the exact strategy OCaml does

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 9, 2023

Are you meaning lacking LSP, or with LSP itself?

I mean that the proper way to find definitions is to parse the source. Doing a regular expression search is always a hack. So there should be some tool, LSP, "defgrep" or whatever, that just finds definitions in a project - it may even output them with source locations on the command line, why not.

@mariari
Copy link
Member

mariari commented Jun 9, 2023

There often are, rust even has it. However if your tooling is flakey or broken for whatever reason. grep is a fallback tool. I do it for rust when rust decides it doesn't want to work, or in our WASM crates. I don't really do it for lisp as I can just ask the repl things. Though I do wish I had better regex searches on the image to dump more words from the context easily. In OCaml I had to do it under our Tezos stuff as they used custom compiler stuff that made tooling break. I don't see how the current approach to Juvix would somehow make this work nicer, you have an interpreter you expose that can help the situation, but how the system is loaded hinders what you can possibly do on this front.

In image based systems it's easier as the system has a way to talk to it. As the system just reflects what it knows, and can dump information. Haskell has this for what is loaded in the interpreter (it isn't an image), and you can :i things. However since it only knows about the current file and the exact modules it brings in, even there you can't query it about things you may be collaborating on.

(Having a way to talk to the system that isn't on point based is important in collaboration, as you may have to query information for the programmer writing as the person observing)

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 9, 2023

I don't see how the current approach to Juvix would somehow make this work nicer, you have an interpreter you expose that can help the situation, but how the system is loaded hinders what you can possibly do on this front.

You're right in that it may fail to work for files which are being edited and cannot yet be parsed, but I hope we manage to mitigate this limitation in general (e.g. by remembering the previous state that parsed correctly). Otherwise, it's just a matter of constructing a syntax tree, so if your source is syntactically correct (doesn't need to type-check or scope-check), I don't see why there should be some fundamental problems as far as searching for definitions by identifier name is concerned. After scope-checking, you can additionally identify which definition precisely a given identifier occurrence refers to (still doesn't need to type-check).

@mariari
Copy link
Member

mariari commented Jun 9, 2023

Well I mean the system is a batch compiler, the system throws away lots of information in producing a binary, the interpreter likely keeps a lot more alive, and can be a valuable tool in this, but it depends on what you are cutting to get it, and how you can query it. (The context of the language isn't in scope for the language itself, meaning that the barrier for users writing tooling on it, or even just has functions to query it, is much higher). Either way I think the approach taken by this project is similar to Rust, Haskell and OCaml, which in practice you end up using grep. There are systems where this isn't the case, but they tend to work fundamentally differently allowing better tooling integration written by the user

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 9, 2023

Either way I think the approach taken by this project is similar to Rust, Haskell and OCaml, which in practice you end up using grep.

I really don't see why this needs to be so as far as searching for definitions is concerned. Maybe it is so because grep was "good enough" and nobody bothered. This is just a matter of parsing the syntax, no need for the interpreter, any information in the binary, or the binary at all. It's a syntactic operation on top of the source files analogous to grep, but instead of using ad-hoc regular expressions you just use the proper grammar of the language.

@mariari
Copy link
Member

mariari commented Jun 9, 2023

I mention it because in practice, the tooling for some reason or another the tooling breaks. Since in a batch oriented language the tooling is the only way to talk to the language system. I.E. there isn't a human interface to query the state of the codebase or what it understands, as the tooling/LSP support died somewhere and is the only model for interaction with the user. In a non batch oriented language or even in a language that can give an interpreter, you can query information from here. Smalltalk lacks a REPL, but since you can evaluate an expression anywhere, it effectively has this capability. Haskell as broken as the tooling often is, the ghci mode gave me at the very least :i for some subset of my project which is in scope. As more information about things live the more you can interact with it. For example, in Factor I can click on any word in the image documentation and open it precisely and start hacking it. Hell I can even hack it from the repl itself without needing to permanently change the source since that is useful when debugging. This tooling will never break, because having a functioning system at all means you have this. However for a language like Haskell, you may have your emacs integration broken, maybe stack decides it doesn't like LSP for this file anymore, in these scenarios your tooling can be turned offline, even if the project compiles and runs. This is because the tooling isn't just apart of the system you're loading things from.

TL;DR inherent methods are more robust, relying on connection with a foreign editor may fail, and users will resort to the"inadequate" tooling, and I don't see how the Juvix model will allow for robust tooling like the example above

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 9, 2023 via email

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jun 29, 2023

With traits and the proposed universal quantification moved to the left:

compose !{A B C} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

I think we came to the conclusion of not having ! on the left, and using forall (and/or its unicode version) only in types. Instead, a bit less uniform but practical solution is to allow omitting : Type on the left for implicit arguments only.

So one can write

compose {A B C} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

and this is equivalent to

compose {A B C : _} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

To the above list of ideas I would also add:

  1. Allow grouping of arguments of the same type on the left, e.g.,
f {A} (x y z : A) : A := ...;

We allow grouping in types. Do we allow it only for implicits or in general? I don't remember.

@janmasrovira janmasrovira self-assigned this Jul 3, 2023
@janmasrovira
Copy link
Collaborator

Summary of what we agreed

Function definitions

Type signatures follow this pattern:

f (a1 : Expr) .. (an : Expr) : Expr

where each ai is a non-empty list of symbols. Braces are used instead of parentheses when the argument is implicit.

Then, we have these variants:

  1. Simple body. After the signature we have := Expr;.
  2. Clauses. The function signature is followed by a non-empty sequence of clauses. Each clause has the form:
| atomPat .. atomPat := Expr

Some examples:

plus (n : Nat) : Nat -> Nat
  | zero := n
  | (suc m) := plus n m;

swap {A B : Type} : Pair A B -> Pair B A
  | (a, b) := b, a;

compose {A B C : Type} (f : B -> C) (g : A -> B) (x : A) : C := f (g x);

compose' {A B C : Type} (f : B -> C) (g : A -> B) : A -> C
  | x := f (g x);

syntax reserve even;

odd : Nat -> Nat:
  | zero := false
  | (suc n) := even n;
even : Nat -> Nat
  | zero := true
  | (suc n) := odd n; 

We did not specifically agree on a good syntax for mutually recursive functions, so the syntax reserve statement is only a placeholder.

Case expressions

In order to avoid ambiguity of pipes in nested case expressions, we agreed on the following syntax:

case atomExpr {
 | pat .. pat := expr -- (the first | is optional)
 ..
 | pat .. pat := expr
}

@lukaszcz
Copy link
Collaborator Author

lukaszcz commented Jul 4, 2023

Do we still want the option to write signatures separately? The proposed change of syntax doesn't preclude this. Then mutually recursive functions would be defined as in my first comment:

even : Nat -> Bool;
odd : Nat -> Bool;

even
  | zero := true
  | (suc n) := odd n;

odd
  | zero := false
  | (suc n) := even n;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants