Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GADT style matching on numeric types #701

Open
yav opened this issue Apr 15, 2020 · 19 comments
Open

GADT style matching on numeric types #701

yav opened this issue Apr 15, 2020 · 19 comments
Labels
language Changes or extensions to the language
Milestone

Comments

@yav
Copy link
Member

yav commented Apr 15, 2020

Some algorithms behave differently depending on the size of the input. Current Crytpol does not support this nicely, because while we can examine the values of numeric types, we have no way to learn type information from that match.

It would be nice to add some GADT-like matching construct on types, where additional type infromation is available in the nested match scope. It would probably be convenient to support a few different "views" of the numbers:

  • unary: check for zero not-zero
  • some kind of binary view: useful for algorithms that split the input in half

This ticket is here to discuss the language design considerations of such a feature.

@yav yav added the language Changes or extensions to the language label Apr 15, 2020
@linesthatinterlace
Copy link

Regarding some kind of binary view - one thing I have encountered is having to add the explicit hint of 2 * 2 ^^ k == 2 ^^ (1 + k) into algorithms that work on types of size [2^^k] by splitting; it would be good if the typechecker could deal with powers of 2, at least, in this manner automatically: I don't know if this needs its own issue or might naturally be solved as part of this?

@robdockins
Copy link
Contributor

For an example of the kinds of workarounds that are currently required, see:
https://github.com/GaloisInc/cryptol/blob/master/examples/Karatsuba.cry

@robdockins
Copy link
Contributor

I think a <= test would satisfy all the use cases. E.g. typecase (x <= y) then a else b. Here a gets to typecheck with x <= y and b gets to typecheck with x > y.

@brianhuffman
Copy link
Contributor

@robdockins I think you're right. I wasn't sure the following would work until I tried them, but they do:

empty : {n, a} (n <= 0) => [n]a
empty = []

single : {n, a} (1 <= n, n <= 1) => a -> [n]a
single x = [x]

double : {n, a} (2 <= n, n <= 2) => a -> a -> [n]a
double x y = [x, y]

So I think having <= constraints would let me write all the base cases I would want.

@yav
Copy link
Member Author

yav commented Apr 15, 2020

GADTs also require a type signature as in general there isn't a most general type to infer, when local constraints are in scope. We probably need a similar restriction. I wonder if this construct should be at the declaration level instead of the expression level. For example we could add a "type guard" to function RHSs, perhaps something like this:

count {n} => [n] -> Integer
count xs
  | n == 0      = 0
  | otherwise = 1 + count ys
                           where [_] # ys = xs 

So here n == 0 would by a local constraint, and otherwise would be a special thing indicating
that none of the above constraints hold (so we'd have to do some negation, but I think we already
do some of that). The fact that not (n == 0) is what would enable the pattern binding in the where to typecheck.

@brianhuffman
Copy link
Contributor

We could just introduce a Coq-style match m as x in p return Q with ... end expression. Simple! :)

@brianhuffman
Copy link
Contributor

The old ticket #325 (closed) was basically asking for the same kind of thing.

@linesthatinterlace
Copy link

A thing I would like to be able to do that I think is a manifestation of a similar sort of problem:
Define type UpperTriangle n a = [n*(n+1)/2]a - this is thought of as a list of coefficients of an upper-triangular matrix of elements of type a (row first, if you like). It would be good to be able to move between data in this form and the actual matrix form of type [n][n]a.

(There are some other issues with this, I think, but essentially the natural ways round them don't work because you hit this kind of issue. But there might be a way round, I'm not sure.)

@robdockins
Copy link
Contributor

I like the syntax of the guard-style, but I worry that it does add some additional semantic complication with negative information and disjunctions (if you allow conjunctions of guards).

You can also get into a weird situation where you're in a context you can't write down. Although: maybe we should just allow full boolean combinations of numeric constraints (as opposed to class constraints), since that's basically what we send to solvers anyway. Maybe that makes inference harder though? I'm not sure.

@robdockins
Copy link
Contributor

As to the triangular matrix question, I don't think you need a typecase to do that, just some clever reindexing. I'm curious what solution you have in mind, though because it isn't obvious to me how to decompose it that way.

Here's how I'd do it:

type UpperTriangle n a = [n*(n+1)/2]a

triangleToMatrix : {n, a} (fin n, n >= 1, Zero a) => UpperTriangle n a -> [n][n]a
triangleToMatrix t = m
  where
  type N = n*(n+1)/2
  type W = width N + 1

  m@i@j = if i <= j then t@(k i + (j - i)) else zero

  k (i:[W]) = `N - (`n - i + 1)*(`n - i)/2

@atomb atomb added this to the 3.0.0 milestone May 5, 2020
@brianhuffman
Copy link
Contributor

There are a couple of things about the guard-like syntax that strike me as awkward. The main thing is that the function declaration binds the argument variables before branching on the type constraint. So if we want specialized argument patterns in each branch, we have to put those inside local where clauses.

A related point is that the guard syntax won't match the structure of the desugared AST: The AST for one of these function declarations would have the type-level lambdas on the outside, then the type-level conditional, and then value-level lambdas inside each branch. Having a syntax that matches that structure would (I think) make it easier for programmers to understand the cryptol programs that they write.

So here's my proposal for a syntax: Multiple defining equations, each tagged with a type constraint. Each equation gets its own separate set of argument patterns.

foo : {n} (fin n) => [n] -> [n] -> [n]
(n == 0) => foo [] [] = []
(n >= 1) => foo ([x] # xs) ([y] # ys) = ...

@robdockins
Copy link
Contributor

The coerceSize primitive we have been discussing in #704 might be a good stopgap in the current absence of this feature.

@msaaltink
Copy link
Contributor

In something I wrote recently, coerceSize is helpful, but it still leaves some shenanigans for dealing with needing to know a size is nonzero. @brianhuffman 's proposal of May 5 2020 would make the Cryptol look much. much nicer and it would have been a lot easier to write.

@brianhuffman
Copy link
Contributor

I gather that the real challenge is not implementing the syntax, but in making the type inference algorithm work. Whether we could get it to work without completely rewriting the type checker is an open question.

@robdockins
Copy link
Contributor

I wonder if attaching the constraint to an entire declaration clause like that might actually make things easier. The usual difficulty with GADT pattern matching is making sure variables don't escape their scope, and that deductions made about type variables don't leak; with the choice being made a top-level there isn't really anyplace for the variable to escape to.

You'd still need to make sure the constraint clauses are total (and disjoint?), but I suppose we can just ask the solver about that.

@rybla
Copy link
Collaborator

rybla commented Jun 15, 2022

There are a couple of things about the guard-like syntax that strike me as awkward. The main thing is that the function declaration binds the argument variables before branching on the type constraint. So if we want specialized argument patterns in each branch, we have to put those inside local where clauses.

A related point is that the guard syntax won't match the structure of the desugared AST: The AST for one of these function declarations would have the type-level lambdas on the outside, then the type-level conditional, and then value-level lambdas inside each branch. Having a syntax that matches that structure would (I think) make it easier for programmers to understand the cryptol programs that they write.

So here's my proposal for a syntax: Multiple defining equations, each tagged with a type constraint. Each equation gets its own separate set of argument patterns.

foo : {n} (fin n) => [n] -> [n] -> [n]
(n == 0) => foo [] [] = []
(n >= 1) => foo ([x] # xs) ([y] # ys) = ...

I agree that this is a principled syntax choice, since the constraint-case branches may affect how the user wants to pattern match on their arguments. However, having foo appear to the right of the constraint in the constraint-cases is a little unintuitive. Alternatively, you could have something like

foo : {n} (fin n) => [n] -> [n] -> [n]
foo `(n == 0) [] [] = []
foo `(n >= 1) ([x] # xs) ([y] # ys) = ...

But that might be overloading backtick too much.

@rybla
Copy link
Collaborator

rybla commented Jun 15, 2022

I wonder if attaching the constraint to an entire declaration clause like that might actually make things easier. The usual difficulty with GADT pattern matching is making sure variables don't escape their scope, and that deductions made about type variables don't leak; with the choice being made a top-level there isn't really anyplace for the variable to escape to.

You'd still need to make sure the constraint clauses are total (and disjoint?), but I suppose we can just ask the solver about that.

A straightforward generalization of these constraint-cases is to introduce constraint disjunctions. In this way, the constraint-cases can be specified at the top level, reflected in the type of the function.

foo : {n} (fin n, n == 0 \/ n >= 1) => [n] -> [n] -> [n]
foo `(n == 0) [] [] = []
foo `(n >= 1) ([x] # xs) ([y] # ys) = ...

Additionally, this allows for some more advanced use cases, for example non-exhaustive disjunctions.

bar : {n} {fin n, n == 1 \/ n % 2 == 0) => [n] -> [n]
bar `(n == 1) [x] = ...
bar `(n % 2 == 0) xs = ...

I think the SMT solver can handle disjunctions like this already, so is it feasible? It's much more like GADT style pattern matching than just exhaustive cases.

@rybla
Copy link
Collaborator

rybla commented Jun 15, 2022

As I've discussed with @yav before, the typechecker can check each constraint-case by including the appropriate new constraint introduced and then typechecking as normal. Overlapping cases are not an issue for typechecking in this way, but of course execution will always just pick the first matching case. An exclusivity checker that calls the SMT solver to see if any pair of cases can be satisfied could be used to provide warnings, such as is done in Haskell.

@rybla
Copy link
Collaborator

rybla commented Jun 21, 2022

Actually, this is too much more general than the topic of this issue, so I'll make a new issue for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language Changes or extensions to the language
Projects
None yet
Development

No branches or pull requests

7 participants