Better error message when contract makes certain rows illegal #1323

Radvendii · 2023-05-25T19:06:40Z

addresses #950

worked on with @jneem @matthew-healy

After we finished working on it, I changed two things

We need to recurse into the type declarations inside a record e.g. forall r. { x : { y : Number ; r } ; r } makes y in r illegal.
I rewrote it to be imperative. This has three advantages I see:
1. Stack size can't explode
2. We only ever need two HashSets
3. Don't need to pass arguments around when they're right there and never change
One disadvantage is that we always need two HashSets, as well as to_be_checked: Vec. But maybe we could get rid of any allocations by moving things around a bit.
The original recursive implementation is there if you just look at the first commit.

TODO for this PR:

decide between recursive and imperative implementation (or a secret third thing?)
write a test?
make sure the error message is what we want
update function documentation where needed
decide where this code should live. It feels like this makes the subcontract() function rather long. Maybe we should move it to its own function. Or maybe not! I don't know what the norms are for this code base.

vkleen · 2023-05-26T07:51:57Z

stdlib/internals.ncl

+      let conflicts = std.array.filter
+        (fun field => std.array.elem field constr)
+        (%fields% value)


Using standard library functions in the internals module makes me a bit scared, because this can relatively easily lead to infinite recursions. Because most standard library functions carry type annotations, these types get converted to contract checks when they get called, which causes subcontract in the interpreter to get called. And if the constellation of types is slightly unlucky that ends up producing a recursion. I think in this case it would be fine, but internals.ncl is special in this way.

Even if it's ugly, we should inline the definitions of filter and elem here.

This is a good point - I hadn't really considered this. There are 9 references to std. in this file. Do you think it's worth rewriting all of them?

Except for the reference to std.contract.Equal, I think that would be a good idea.

As discussed on Matrix (thread), this is extra painful because internals.ncl must currently be a syntactic record - i.e., it's not enough that it just evaluates to a record.

I proposed an alternative solution whereby we split standard library function implementations out into a module std.unsafe, and then re-export them in the existing modules with type & contract annotations provided on top. This means we can (safely) call std.unsafe functions from within internals without worrying about infinite recursion.

I spiked the std.unsafe idea, and it's gonna be a bit more complex than I'd hoped (because, e.g., std.unsafe.array.length gets inferred as Dyn and then we can't unify that with std.array.length: forall a. Array a -> Number). It might be possible to get the sort of API we want using codegen to avoid duplicating definitions, but that's probably not something we should rush into.

As such, we have two options:

Leave these calls as-is, and tackle every call to std in this file at the same time, or

Fix these two callsites, and come back for the rest of the file once we've worked out a better solution.

I'm maybe leaning towards the first option, but I don't think I have strong feelings either way.

Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:

write a single implementation that is statically typechecked

export it either without any contract nor type (std.unsafe) or wrapped inside a type annotation (the current std interface)

I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length could be handled that way.

I made this issue for discussing this issue further. Can we resolve the thread on this PR and leave in std. for now?

Since in this case we're not actually producing infinite recursions, I agree we should keep this as is in the interest of moving this PR forward 👍

By the way, it's not only about infinite recursion, but for performance as well. Contract do have a non-trivial impact on performance, and this is also why we try to refrain from firing contracts from within contracts in a non obvious way. For example, I believe we're also using %length% instead of array.length although there's no risk of recursion. This can be reworked later, so let's not block this PR on this, but just to give a bit more context. internals are really low-level functions that are called implicitly, potentially quite a lot, by the interpreter, so we've been trying to keep them low level as well.

yannham

We need to compute those constraints as well during typechecking. This is probably not super costly, but it's a bit sad that we now duplicate the work in both contract() and type_check(). See typecheck::ConstrainFreshRRowsVar and its implementation.

Those so-called "lack predicate" constraints are purely syntactical. If we use them in many places, I wonder if we shouldn't compute them earlier. Parsing time is a good candidate (so that we don't have to first produce an incomplete AST with Options that is then latter filled, but rather generate right away a complete structure). A forall variable would have a name and a kind as before, together with an additional constraint field or better named (must_not_contain or must_lack or lacks or whatever) field which is a HashSet or simply a Vec, as you suggest, because I expect the most common lengths for those constraints to be between 1 and 5.

yannham · 2023-05-27T05:54:37Z

src/types.rs

+                let mut maybe_constr: HashSet<Ident> = HashSet::new();
+                let mut to_be_checked: Vec<&Types> = vec![body];
+                while let Some(tys) = to_be_checked.pop() {
+                    match &tys.types {


This looks like a visitor pattern. Rather than burying this code in this specific function, I wonder if we could separate the tree-walking part into a generic method of Types (it looks like a forM or something, at least from a distance) and simply keep the pattern matching logic here (but maybe it doesn't fit such a pattern, because e.g. of shadowing?)

This is a good thought. Hmm... I'm not sure if you meant something else by visitor pattern, but I tried implementing it as an iterator, and in order to allow the shadowed variable check, it had to look something like this:

pub struct TypesIterator<'a> { pub(crate) todo: Vec<&'a Types>, pub(crate) should_recurse: Option<Box<dyn Fn(&'a Types) -> bool>>, } impl<'a> Iterator for TypesIterator<'a> { type Item = &'a Types; fn next(&mut self) -> Option<Self::Item> { self.todo.pop().map(|tys| { let should_recurse = match &self.should_recurse { None => true, Some(f) => f(tys), }; if should_recurse { match &tys.types { TypeF::Arrow(s, t) => { self.todo.push(s); self.todo.push(t); } TypeF::Forall { body, .. } => { self.todo.push(body); } TypeF::Record(rrows) => { for ritem in rrows.iter() { if let RecordRowsIteratorItem::Row(row) = ritem { self.todo.push(row.types); } } } TypeF::Dict { type_fields, flavour, } => { // XXX: is this the right semantics? I'm not sure whether we should recurse // into contracts or not if flavour == &DictTypeFlavour::Type { self.todo.push(type_fields); } } TypeF::Array(t) => { self.todo.push(t); } _ => (), } } tys }) } } impl Types { pub fn iter(&self, should_recurse: Option<impl Fn(&Types) -> bool + 'static>) -> TypesIterator { TypesIterator { todo: vec![self], should_recurse: should_recurse.map(|x| Box::new(x) as Box<dyn Fn(&Types) -> bool>), } } }

And then you call it like

let var_ = var.clone(); let ignore_shadowed = move |tys: &Types| match &tys.types { TypeF::Forall { var, .. } if var == &var_ => false, _ => true, }; for tys in body.iter(Some(ignore_shadowed)) { ... }

Advantages:

clear separation of concerns

Problems:

It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?

Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in None, because it needs a concrete type there.

One alternative is a much more specialized function, like iter_forall(body: &Types, var: Ident) which takes into account shadowing. But that feels really specific...

Yup, I was thinking of something like this. Even if it's not reused, I feel it separates concerns and makes the code more readable (the call site to this method is only concerned with the logic of the processing; if you want to see how the iteration work, you can then go to the definition of iter()).

It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?

This is a good question, which is hard to answer now. But, what we can do is to give a specific name to this function, make the documentation clear enough. If one day we need something more general, it shouldn't be too hard to do, and to replace the function you wrote by a specialized call to the general one.

Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in None, because it needs a concrete type there.

I'm not sure to see why we need 'static here, by the way. And we can probably do with FnMut instead of Fn. But, as far as the Option is concerned, I would just get rid of it. None is semantically equivalent to |_| true, which is really not that longer to write for the caller, while as you mentioned, it is annoying with generics (and require an additional pattern matching). I'm relatively confident that the optimized code will be as efficient with |_| true as with None, thanks to monorphization.

This is kind of moot now, because I moved the code into the parsing stage, and built it into the recursion that's already present there for determining var_kind

yannham · 2023-05-27T06:00:14Z

stdlib/internals.ncl

+      let conflicts = std.array.filter
+        (fun field => std.array.elem field constr)
+        (%fields% value)


Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:

write a single implementation that is statically typechecked

export it either without any contract nor type (std.unsafe) or wrapped inside a type annotation (the current std interface)

I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length could be handled that way.

src/types.rs

Radvendii · 2023-05-30T21:13:43Z

We need to compute those constraints as well during typechecking. This is probably not super costly, but it's a bit sad that we now duplicate the work in both contract() and type_check(). See typecheck::ConstrainFreshRRowsVar and its implementation.
Those so-called "lack predicate" constraints are purely syntactical. If we use them in many places, I wonder if we shouldn't compute them earlier.

Yeah, we based this implementation on that one. It would be nice if they could be unified.

Parsing time is a good candidate (so that we don't have to first produce an incomplete AST with Options that is then latter filled, but rather generate right away a complete structure).

I think we'll have to wait until fix_type_vars phase, since we need to know the var_kind before this code runs. Maybe we can roll it into that phase though? It seems like a related action.

A forall variable would have a name and a kind as before, together with an additional constraint field or better named (must_not_contain or must_lack or lacks or whatever) field

i.e. TypeF::Forall { var, var_kind, body, lacks }?

yannham · 2023-05-31T01:27:26Z

I think we'll have to wait until fix_type_vars phase, since we need to know the var_kind before this code runs. Maybe we can roll it into that phase though? It seems like a related action.

Yes, fix_type_vars is interleaved with parsing so I didn't think of it as a separate phase, but I was precisely thinking about this one.

i.e. TypeF::Forall { var, var_kind, body, lacks }?

Either that, or putting the lacks predicate inside var_kind, because I believe lacks doesn't make sense for a normal type variable, only for record and enum rows. Maybe even only for record rows.

Radvendii · 2023-06-01T15:29:31Z

Maybe even only for record rows.

Oh, you're right. We implemented it for enums but that doesn't really apply. Type checking verifies that

let f : forall e. [| ; e |] -> [| 'x ; e |] = fun a => 'x in f 'x

works fine.

As an aside, I was poking around while testing that and found

let f : forall e. Bool -> [| ; e |] -> [| 'x; e |] = fun b a => if b then a else 'x in f 'x

yannham · 2023-06-05T09:43:06Z

Fails to type check, but will work as a contract (s/:/|/). Is that intended behaviour? It seems like[| 'x |] won't unify either with [| 'x ; e |] or with [| ; e |], but it seems to me that both should work? Maybe this has to do with the polymorphic tail sealing?

In general, when you introduce a polymorphic type variable (be it a normal type variable, or a row type variable), the variable can't unify with most of types. For example:

let f : forall a . Number -> a = fun x => x

will rightly fail, because the forall a. means "as a caller of f, I can pick absolutely any a, so you can't assume anything. But you're assuming a = Number". Technically, we introduce a so-called rigid type variable (or skolem variable, or type constant, etc.), which can only unifies with itself or a free unification variable.

Under this interpretation, it makes sense I think that [| 'x |] doesn't unify with [| 'x ; e |], because you would arbitrarily choose that e is the empty row, but e is a universally quantified variable. That being said, there's this strange things with enums that because this is a sum type (coproduct), the tail has a different meaning than for records. Records are products, so {foo : T, bar : U; r} means a value that must have field foo : T AND bar : U AND the tail r. An value of type [| 'x, 'y; e|] is a enum tag which is either 'x OR 'y OR something in e. This means that morally, [| 'x |] is indeed a subtype of [|'x|], and more generally [| 'x; e |] for any tail e. Which is, I think, your intuition.

Subtyping is a whole can of worm, and the point of row polymorphism is to get the same kind of flexibility while staying in a purely classical polymorphic approach. We really don't want to introduce subtyping (at least not for row types).

I believe there is an obscure primop which can widen a row type in Nickel already, embed_id (written %embed% id), with type forall rows. [| ; rows |] -> [| 'id ; rows |]. Unfortunately I think widening is not sufficient to type your example, because we would need to widen the type of [| 'x |] with a full unknown row e, and not just one 'id.

I wonder if we can't simply infer open types for enum tags. That is, infer 'x to have type forall e. [| 'x; e|] (in practice, [| 'x; _a |] with _a a unification variable). That might be sufficient to handle those cases.

vkleen · 2023-06-05T11:40:41Z

I wonder if we can't simply infer open types for enum tags. That is, infer 'x to have type forall e. [| 'x; e|] (in practice, [| 'x; _a |] with _a a unification variable). That might be sufficient to handle those cases.

When typechecking, I think we're already doing that. In src/typecheck/mod.rs:1401 in the check function we have:

        Term::Enum(id) => {
            let row = state.table.fresh_erows_uvar();
            unify(state, &ctxt, ty, mk_uty_enum!(*id; row))
                .map_err(|err| err.into_typecheck_err(state, rt.pos))
        }

yannham · 2023-06-05T13:43:53Z

So @vkleen is right, my bad. We already do this. We looked at the example and deduced that the issue was actually the converse : the type of a is not general enough, because it doesn't include 'x. The typechecker enforces that both branches of the if-then-else have the same type, one is [| ; e |] (a) and the other is [| 'x; _t0 |] for a free unification variable _t0. The unifier complains that [| ; e |] doesn't contain the row 'x.

We can thus actually fix the example with the widening operator (this is an undocumented, prototype legacy - the syntax in particular is strange, as it takes an identifier as a parameter, but interpret it as a symbol, a bit like quoting in Lisp):

> let f : forall e. Bool -> [| ; e |] -> [| 'x; e |] = fun b a => if b then (%embed% x a) else 'x in f true 'y
'y

%embed% x is the identity, but has type forall a. [| ; a|] -> [| 'x ; a |]. I honestly don't know if we can do much better currently. Starting to try to automatically apply widening sounds much like having subtyping. It's just a first impression though, we might want to take a second look at the literature on row types.

One possibility would be to fix the odd syntax (and name, probably) of %embed% and wrap it in the stdlib as a proper function, and document it. It's not ideal but would at least permit to write some functions that are actually well-typed but currently fails typechecking, like your example.

Radvendii · 2023-06-06T14:44:18Z

There's something I don't understand. We seem to have concluded that enums don't have the same constraint / excluded behaviour that records do, but all of the machinery is there in the type checking (see ConstrainFreshERowsVar, for instance), and as far as I can tell it's being used. What am I missing?

Radvendii · 2023-06-13T17:06:38Z

In rebasing, I had to contend with f8c8a02 which actually relies on VarKind just being a simple enum with no fields.

At first I was going to use Discriminant<VarKind> to get that with the new VarKind but there's no way to match on values of that type, so I made a new type VarKindDiscriminant. Not the prettiest code, but it works.

I think ultimately we may want to store the excluded information in a separate HashSet like we do at typechecking time, but create that earlier at parse time. That way (a) we don't need to store the HashSet pointer for all the other VarKinds, and (b) we don't have to create the HashSet later in typechecking phase.

yannham

Left a bunch of comments, but LGTM otherwise.

src/error.rs

src/parser/uniterm.rs

yannham · 2023-06-19T08:52:31Z

src/parser/uniterm.rs

+    /// Return the current var_kind. Default to `VarKind::Type` if it's unused as in
+    /// `forall a. Number`
+    // TODO: optimization: when var_kind is called, there are actually no other references to this
+    // VarKind. We should be able to architect this so there's no clone here.


You can consume the value inside an option by using take, but that is probably requiring a &mut ref. Something like:

pub fn take_var_kind(&mut self) -> VarKind { self.0.borrow_mut().take().unwrap_or(VarKind::Type) }

Hmm... So in fact this does not require &mut self, because we're borrowing the inside mutably. But what happens is that we take it out of the Option, and leave Nothing behind. Not quite what I was imagining (taking the whole key-value pair out of the Environment), and if this ends up causing trouble it will likely produce worse error messages, but it does work.

src/parser/uniterm.rs

src/typecheck/mod.rs

fixes #950 by calculating the fields disallowed in a parameterized tail e.g. in `forall r. { x : Foo ; r }` `x` cannot be present in `r`

we check for constraints in contract checking and type checking. by moving it to the parser we only have to do it in one place. also, removed constraint checking for enum rows, which doesn't seem to have been doing anything and also we don't want it. `[| 'x ; e |]` does not preclude `'x` showing up in `e` added in VarKindDiscriminant for those places where we do actually just want the enum type. This is mostly for error reporting.

Radvendii requested review from jneem and matthew-healy May 25, 2023 19:06

github-actions bot temporarily deployed to pull request May 25, 2023 19:11 Inactive

github-actions bot temporarily deployed to pull request May 25, 2023 19:17 Inactive

github-actions bot temporarily deployed to pull request May 25, 2023 20:57 Inactive

vkleen reviewed May 26, 2023

View reviewed changes

yannham reviewed May 27, 2023

View reviewed changes

Radvendii mentioned this pull request May 30, 2023

Deal with potential infinite recursion stdlib <-> internals.ncl #1334

Open

github-actions bot temporarily deployed to pull request May 30, 2023 21:18 Inactive

github-actions bot temporarily deployed to pull request May 30, 2023 21:27 Inactive

Radvendii force-pushed the 950-excluded-fields branch from d16f5a9 to 5d1b89c Compare June 13, 2023 15:53

github-actions bot temporarily deployed to pull request June 13, 2023 15:58 Inactive

github-actions bot temporarily deployed to pull request June 14, 2023 18:03 Inactive

yannham mentioned this pull request Jun 15, 2023

Improve efficiency of constraints generation and type substitution #202

Open

Radvendii requested review from yannham, vkleen and matthew-healy and removed request for matthew-healy June 16, 2023 15:18

yannham approved these changes Jun 19, 2023

View reviewed changes

Radvendii added 3 commits June 21, 2023 12:44

blame caller when tail contains illegal fields

114e201

fixes #950 by calculating the fields disallowed in a parameterized tail e.g. in `forall r. { x : Foo ; r }` `x` cannot be present in `r`

make get_constraints() imperative and inline

ef7e960

reorder match fields to reflect order they will be executed

e53a27b

Radvendii added 10 commits June 21, 2023 12:44

appease clippy

d92b48e

recurse into record sub-types even if we are looking for enum tails

35733ab

NickelString has a From<Label> instance

f6407b8

separate recursion from checking for constraints

123da05

VarKindCellData -> Option<VarKind>

0f6de08

update documentation

77aeda8

fixup

d227749

add tests

5293e78

address review comments

ebee04a

Radvendii force-pushed the 950-excluded-fields branch from b8e8aa4 to ebee04a Compare June 21, 2023 17:04

github-actions bot temporarily deployed to pull request June 21, 2023 17:07 Inactive

Radvendii added this pull request to the merge queue Jun 22, 2023

Radvendii removed this pull request from the merge queue due to a manual request Jun 22, 2023

Radvendii added this pull request to the merge queue Jun 22, 2023

Merged via the queue into master with commit f5efb90 Jun 22, 2023

Radvendii deleted the 950-excluded-fields branch June 22, 2023 01:23

This was referenced Jun 22, 2023

Type error depends on name of irrelevant record field #1393

Closed

Polymorphic record contracts should know which fields their row variables cannot contain #950

Closed

chenrui333 mentioned this pull request Jul 10, 2023

nickel 1.1.1 Homebrew/homebrew-core#136239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error message when contract makes certain rows illegal #1323

Better error message when contract makes certain rows illegal #1323

Radvendii commented May 25, 2023 •

edited

Loading

vkleen May 26, 2023

matthew-healy May 26, 2023

vkleen May 26, 2023

matthew-healy May 26, 2023

matthew-healy May 26, 2023

yannham May 27, 2023

Radvendii May 30, 2023

vkleen May 30, 2023

yannham May 31, 2023 •

edited

Loading

yannham left a comment

yannham May 27, 2023

Radvendii May 30, 2023

yannham Jun 5, 2023

Radvendii Jun 7, 2023

yannham May 27, 2023

Radvendii commented May 30, 2023

yannham commented May 31, 2023 •

edited

Loading

Radvendii commented Jun 1, 2023

yannham commented Jun 5, 2023 •

edited

Loading

vkleen commented Jun 5, 2023

yannham commented Jun 5, 2023

Radvendii commented Jun 6, 2023

Radvendii commented Jun 13, 2023

yannham left a comment

yannham Jun 19, 2023

Radvendii Jun 20, 2023

Better error message when contract makes certain rows illegal #1323

Better error message when contract makes certain rows illegal #1323

Conversation

Radvendii commented May 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yannham May 31, 2023 • edited Loading

Choose a reason for hiding this comment

yannham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Radvendii commented May 30, 2023

yannham commented May 31, 2023 • edited Loading

Radvendii commented Jun 1, 2023

yannham commented Jun 5, 2023 • edited Loading

vkleen commented Jun 5, 2023

yannham commented Jun 5, 2023

Radvendii commented Jun 6, 2023

Radvendii commented Jun 13, 2023

yannham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Radvendii commented May 25, 2023 •

edited

Loading

yannham May 31, 2023 •

edited

Loading

yannham commented May 31, 2023 •

edited

Loading

yannham commented Jun 5, 2023 •

edited

Loading