Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when contract makes certain rows illegal #1323

Merged
merged 13 commits into from
Jun 22, 2023

Conversation

Radvendii
Copy link
Member

@Radvendii Radvendii commented May 25, 2023

addresses #950

worked on with @jneem @matthew-healy

After we finished working on it, I changed two things

  1. We need to recurse into the type declarations inside a record e.g. forall r. { x : { y : Number ; r } ; r } makes y in r illegal.

  2. I rewrote it to be imperative. This has three advantages I see:

    1. Stack size can't explode
    2. We only ever need two HashSets
    3. Don't need to pass arguments around when they're right there and never change

    One disadvantage is that we always need two HashSets, as well as to_be_checked: Vec. But maybe we could get rid of any allocations by moving things around a bit.
    The original recursive implementation is there if you just look at the first commit.

TODO for this PR:

  • decide between recursive and imperative implementation (or a secret third thing?)
  • write a test?
  • make sure the error message is what we want
  • update function documentation where needed
  • decide where this code should live. It feels like this makes the subcontract() function rather long. Maybe we should move it to its own function. Or maybe not! I don't know what the norms are for this code base.

@github-actions github-actions bot temporarily deployed to pull request May 25, 2023 19:11 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 25, 2023 19:17 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 25, 2023 20:57 Inactive
Comment on lines 144 to 146
let conflicts = std.array.filter
(fun field => std.array.elem field constr)
(%fields% value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using standard library functions in the internals module makes me a bit scared, because this can relatively easily lead to infinite recursions. Because most standard library functions carry type annotations, these types get converted to contract checks when they get called, which causes subcontract in the interpreter to get called. And if the constellation of types is slightly unlucky that ends up producing a recursion. I think in this case it would be fine, but internals.ncl is special in this way.

Even if it's ugly, we should inline the definitions of filter and elem here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point - I hadn't really considered this. There are 9 references to std. in this file. Do you think it's worth rewriting all of them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the reference to std.contract.Equal, I think that would be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Matrix (thread), this is extra painful because internals.ncl must currently be a syntactic record - i.e., it's not enough that it just evaluates to a record.

I proposed an alternative solution whereby we split standard library function implementations out into a module std.unsafe, and then re-export them in the existing modules with type & contract annotations provided on top. This means we can (safely) call std.unsafe functions from within internals without worrying about infinite recursion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spiked the std.unsafe idea, and it's gonna be a bit more complex than I'd hoped (because, e.g., std.unsafe.array.length gets inferred as Dyn and then we can't unify that with std.array.length: forall a. Array a -> Number). It might be possible to get the sort of API we want using codegen to avoid duplicating definitions, but that's probably not something we should rush into.

As such, we have two options:

  • Leave these calls as-is, and tackle every call to std in this file at the same time, or
  • Fix these two callsites, and come back for the rest of the file once we've worked out a better solution.

I'm maybe leaning towards the first option, but I don't think I have strong feelings either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:

  • write a single implementation that is statically typechecked
  • export it either without any contract nor type (std.unsafe) or wrapped inside a type annotation (the current std interface)

I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length could be handled that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this issue for discussing this issue further. Can we resolve the thread on this PR and leave in std. for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in this case we're not actually producing infinite recursions, I agree we should keep this as is in the interest of moving this PR forward 👍

Copy link
Member

@yannham yannham May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it's not only about infinite recursion, but for performance as well. Contract do have a non-trivial impact on performance, and this is also why we try to refrain from firing contracts from within contracts in a non obvious way. For example, I believe we're also using %length% instead of array.length although there's no risk of recursion. This can be reworked later, so let's not block this PR on this, but just to give a bit more context. internals are really low-level functions that are called implicitly, potentially quite a lot, by the interpreter, so we've been trying to keep them low level as well.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to compute those constraints as well during typechecking. This is probably not super costly, but it's a bit sad that we now duplicate the work in both contract() and type_check(). See typecheck::ConstrainFreshRRowsVar and its implementation.

Those so-called "lack predicate" constraints are purely syntactical. If we use them in many places, I wonder if we shouldn't compute them earlier. Parsing time is a good candidate (so that we don't have to first produce an incomplete AST with Options that is then latter filled, but rather generate right away a complete structure). A forall variable would have a name and a kind as before, together with an additional constraint field or better named (must_not_contain or must_lack or lacks or whatever) field which is a HashSet or simply a Vec, as you suggest, because I expect the most common lengths for those constraints to be between 1 and 5.

src/types.rs Outdated
let mut maybe_constr: HashSet<Ident> = HashSet::new();
let mut to_be_checked: Vec<&Types> = vec![body];
while let Some(tys) = to_be_checked.pop() {
match &tys.types {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a visitor pattern. Rather than burying this code in this specific function, I wonder if we could separate the tree-walking part into a generic method of Types (it looks like a forM or something, at least from a distance) and simply keep the pattern matching logic here (but maybe it doesn't fit such a pattern, because e.g. of shadowing?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good thought. Hmm... I'm not sure if you meant something else by visitor pattern, but I tried implementing it as an iterator, and in order to allow the shadowed variable check, it had to look something like this:

pub struct TypesIterator<'a> {
    pub(crate) todo: Vec<&'a Types>,
    pub(crate) should_recurse: Option<Box<dyn Fn(&'a Types) -> bool>>,
}

impl<'a> Iterator for TypesIterator<'a> {
    type Item = &'a Types;

    fn next(&mut self) -> Option<Self::Item> {
        self.todo.pop().map(|tys| {
            let should_recurse = match &self.should_recurse {
                None => true,
                Some(f) => f(tys),
            };
            if should_recurse {
                match &tys.types {
                    TypeF::Arrow(s, t) => {
                        self.todo.push(s);
                        self.todo.push(t);
                    }
                    TypeF::Forall { body, .. } => {
                        self.todo.push(body);
                    }
                    TypeF::Record(rrows) => {
                        for ritem in rrows.iter() {
                            if let RecordRowsIteratorItem::Row(row) = ritem {
                                self.todo.push(row.types);
                            }
                        }
                    }
                    TypeF::Dict {
                        type_fields,
                        flavour,
                    } => {
                        // XXX: is this the right semantics? I'm not sure whether we should recurse
                        // into contracts or not
                        if flavour == &DictTypeFlavour::Type {
                            self.todo.push(type_fields);
                        }
                    }
                    TypeF::Array(t) => {
                        self.todo.push(t);
                    }
                    _ => (),
                }
            }
            tys
        })
    }
}

impl Types {
    pub fn iter(&self, should_recurse: Option<impl Fn(&Types) -> bool + 'static>) -> TypesIterator {
        TypesIterator {
            todo: vec![self],
            should_recurse: should_recurse.map(|x| Box::new(x) as Box<dyn Fn(&Types) -> bool>),
        }
    }
}

And then you call it like

                let var_ = var.clone();
                let ignore_shadowed = move |tys: &Types| match &tys.types {
                    TypeF::Forall { var, .. } if var == &var_ => false,
                    _ => true,
                };
                for tys in body.iter(Some(ignore_shadowed)) {
                    ...
                }

Advantages:

  • clear separation of concerns

Problems:

  • It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?
  • Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in None, because it needs a concrete type there.

One alternative is a much more specialized function, like iter_forall(body: &Types, var: Ident) which takes into account shadowing. But that feels really specific...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was thinking of something like this. Even if it's not reused, I feel it separates concerns and makes the code more readable (the call site to this method is only concerned with the logic of the processing; if you want to see how the iteration work, you can then go to the definition of iter()).

It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?

This is a good question, which is hard to answer now. But, what we can do is to give a specific name to this function, make the documentation clear enough. If one day we need something more general, it shouldn't be too hard to do, and to replace the function you wrote by a specialized call to the general one.

Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in None, because it needs a concrete type there.

I'm not sure to see why we need 'static here, by the way. And we can probably do with FnMut instead of Fn. But, as far as the Option is concerned, I would just get rid of it. None is semantically equivalent to |_| true, which is really not that longer to write for the caller, while as you mentioned, it is annoying with generics (and require an additional pattern matching). I'm relatively confident that the optimized code will be as efficient with |_| true as with None, thanks to monorphization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of moot now, because I moved the code into the parsing stage, and built it into the recursion that's already present there for determining var_kind

Comment on lines 144 to 146
let conflicts = std.array.filter
(fun field => std.array.elem field constr)
(%fields% value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:

  • write a single implementation that is statically typechecked
  • export it either without any contract nor type (std.unsafe) or wrapped inside a type annotation (the current std interface)

I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length could be handled that way.

src/types.rs Outdated Show resolved Hide resolved
@Radvendii
Copy link
Member Author

We need to compute those constraints as well during typechecking. This is probably not super costly, but it's a bit sad that we now duplicate the work in both contract() and type_check(). See typecheck::ConstrainFreshRRowsVar and its implementation.
Those so-called "lack predicate" constraints are purely syntactical. If we use them in many places, I wonder if we shouldn't compute them earlier.

Yeah, we based this implementation on that one. It would be nice if they could be unified.

Parsing time is a good candidate (so that we don't have to first produce an incomplete AST with Options that is then latter filled, but rather generate right away a complete structure).

I think we'll have to wait until fix_type_vars phase, since we need to know the var_kind before this code runs. Maybe we can roll it into that phase though? It seems like a related action.

A forall variable would have a name and a kind as before, together with an additional constraint field or better named (must_not_contain or must_lack or lacks or whatever) field

i.e. TypeF::Forall { var, var_kind, body, lacks }?

@github-actions github-actions bot temporarily deployed to pull request May 30, 2023 21:18 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 30, 2023 21:27 Inactive
@yannham
Copy link
Member

yannham commented May 31, 2023

I think we'll have to wait until fix_type_vars phase, since we need to know the var_kind before this code runs. Maybe we can roll it into that phase though? It seems like a related action.

Yes, fix_type_vars is interleaved with parsing so I didn't think of it as a separate phase, but I was precisely thinking about this one.

i.e. TypeF::Forall { var, var_kind, body, lacks }?

Either that, or putting the lacks predicate inside var_kind, because I believe lacks doesn't make sense for a normal type variable, only for record and enum rows. Maybe even only for record rows.

@Radvendii
Copy link
Member Author

Maybe even only for record rows.

Oh, you're right. We implemented it for enums but that doesn't really apply. Type checking verifies that

let f : forall e. [| ; e |] -> [| 'x ; e |] = fun a => 'x in f 'x

works fine.


As an aside, I was poking around while testing that and found

let f : forall e. Bool -> [| ; e |] -> [| 'x; e |] = fun b a => if b then a else 'x in f 'x

Fails to type check, but will work as a contract (s/:/|/). Is that intended behaviour? It seems like[| 'x |] won't unify either with [| 'x ; e |] or with [| ; e |], but it seems to me that both should work? Maybe this has to do with the polymorphic tail sealing?

@yannham
Copy link
Member

yannham commented Jun 5, 2023

Fails to type check, but will work as a contract (s/:/|/). Is that intended behaviour? It seems like[| 'x |] won't unify either with [| 'x ; e |] or with [| ; e |], but it seems to me that both should work? Maybe this has to do with the polymorphic tail sealing?

In general, when you introduce a polymorphic type variable (be it a normal type variable, or a row type variable), the variable can't unify with most of types. For example:

let f : forall a . Number -> a = fun x => x

will rightly fail, because the forall a. means "as a caller of f, I can pick absolutely any a, so you can't assume anything. But you're assuming a = Number". Technically, we introduce a so-called rigid type variable (or skolem variable, or type constant, etc.), which can only unifies with itself or a free unification variable.

Under this interpretation, it makes sense I think that [| 'x |] doesn't unify with [| 'x ; e |], because you would arbitrarily choose that e is the empty row, but e is a universally quantified variable. That being said, there's this strange things with enums that because this is a sum type (coproduct), the tail has a different meaning than for records. Records are products, so {foo : T, bar : U; r} means a value that must have field foo : T AND bar : U AND the tail r. An value of type [| 'x, 'y; e|] is a enum tag which is either 'x OR 'y OR something in e. This means that morally, [| 'x |] is indeed a subtype of [|'x|], and more generally [| 'x; e |] for any tail e. Which is, I think, your intuition.

Subtyping is a whole can of worm, and the point of row polymorphism is to get the same kind of flexibility while staying in a purely classical polymorphic approach. We really don't want to introduce subtyping (at least not for row types).

I believe there is an obscure primop which can widen a row type in Nickel already, embed_id (written %embed% id), with type forall rows. [| ; rows |] -> [| 'id ; rows |]. Unfortunately I think widening is not sufficient to type your example, because we would need to widen the type of [| 'x |] with a full unknown row e, and not just one 'id.

I wonder if we can't simply infer open types for enum tags. That is, infer 'x to have type forall e. [| 'x; e|] (in practice, [| 'x; _a |] with _a a unification variable). That might be sufficient to handle those cases.

@vkleen
Copy link
Contributor

vkleen commented Jun 5, 2023

I wonder if we can't simply infer open types for enum tags. That is, infer 'x to have type forall e. [| 'x; e|] (in practice, [| 'x; _a |] with _a a unification variable). That might be sufficient to handle those cases.

When typechecking, I think we're already doing that. In src/typecheck/mod.rs:1401 in the check function we have:

        Term::Enum(id) => {
            let row = state.table.fresh_erows_uvar();
            unify(state, &ctxt, ty, mk_uty_enum!(*id; row))
                .map_err(|err| err.into_typecheck_err(state, rt.pos))
        }

@yannham
Copy link
Member

yannham commented Jun 5, 2023

So @vkleen is right, my bad. We already do this. We looked at the example and deduced that the issue was actually the converse : the type of a is not general enough, because it doesn't include 'x. The typechecker enforces that both branches of the if-then-else have the same type, one is [| ; e |] (a) and the other is [| 'x; _t0 |] for a free unification variable _t0. The unifier complains that [| ; e |] doesn't contain the row 'x.

We can thus actually fix the example with the widening operator (this is an undocumented, prototype legacy - the syntax in particular is strange, as it takes an identifier as a parameter, but interpret it as a symbol, a bit like quoting in Lisp):

> let f : forall e. Bool -> [| ; e |] -> [| 'x; e |] = fun b a => if b then (%embed% x a) else 'x in f true 'y
'y

%embed% x is the identity, but has type forall a. [| ; a|] -> [| 'x ; a |]. I honestly don't know if we can do much better currently. Starting to try to automatically apply widening sounds much like having subtyping. It's just a first impression though, we might want to take a second look at the literature on row types.

One possibility would be to fix the odd syntax (and name, probably) of %embed% and wrap it in the stdlib as a proper function, and document it. It's not ideal but would at least permit to write some functions that are actually well-typed but currently fails typechecking, like your example.

@Radvendii
Copy link
Member Author

There's something I don't understand. We seem to have concluded that enums don't have the same constraint / excluded behaviour that records do, but all of the machinery is there in the type checking (see ConstrainFreshERowsVar, for instance), and as far as I can tell it's being used. What am I missing?

@github-actions github-actions bot temporarily deployed to pull request June 13, 2023 15:58 Inactive
@Radvendii
Copy link
Member Author

In rebasing, I had to contend with f8c8a02 which actually relies on VarKind just being a simple enum with no fields.

At first I was going to use Discriminant<VarKind> to get that with the new VarKind but there's no way to match on values of that type, so I made a new type VarKindDiscriminant. Not the prettiest code, but it works.

I think ultimately we may want to store the excluded information in a separate HashSet like we do at typechecking time, but create that earlier at parse time. That way (a) we don't need to store the HashSet pointer for all the other VarKinds, and (b) we don't have to create the HashSet later in typechecking phase.

@github-actions github-actions bot temporarily deployed to pull request June 14, 2023 18:03 Inactive
@Radvendii Radvendii requested review from yannham, vkleen and matthew-healy and removed request for matthew-healy June 16, 2023 15:18
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments, but LGTM otherwise.

src/error.rs Outdated Show resolved Hide resolved
src/parser/uniterm.rs Outdated Show resolved Hide resolved
src/parser/uniterm.rs Outdated Show resolved Hide resolved
/// Return the current var_kind. Default to `VarKind::Type` if it's unused as in
/// `forall a. Number`
// TODO: optimization: when var_kind is called, there are actually no other references to this
// VarKind. We should be able to architect this so there's no clone here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consume the value inside an option by using take, but that is probably requiring a &mut ref. Something like:

pub fn take_var_kind(&mut self) -> VarKind {
  self.0.borrow_mut().take().unwrap_or(VarKind::Type)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... So in fact this does not require &mut self, because we're borrowing the inside mutably. But what happens is that we take it out of the Option, and leave Nothing behind. Not quite what I was imagining (taking the whole key-value pair out of the Environment), and if this ends up causing trouble it will likely produce worse error messages, but it does work.

src/parser/uniterm.rs Outdated Show resolved Hide resolved
src/typecheck/mod.rs Outdated Show resolved Hide resolved
src/typecheck/mod.rs Outdated Show resolved Hide resolved
fixes #950 by calculating the fields disallowed in a parameterized tail
e.g. in `forall r. { x : Foo ; r }` `x` cannot be present in `r`
we check for constraints in contract checking and type checking. by
moving it to the parser we only have to do it in one place.

also, removed constraint checking for enum rows, which doesn't seem to
have been doing anything and also we don't want it.
`[| 'x ; e |]` does not preclude `'x` showing up in `e`

added in VarKindDiscriminant for those places where we do actually just
want the enum type. This is mostly for error reporting.
@github-actions github-actions bot temporarily deployed to pull request June 21, 2023 17:07 Inactive
@Radvendii Radvendii added this pull request to the merge queue Jun 22, 2023
@Radvendii Radvendii removed this pull request from the merge queue due to a manual request Jun 22, 2023
@Radvendii Radvendii added this pull request to the merge queue Jun 22, 2023
Merged via the queue into master with commit f5efb90 Jun 22, 2023
@Radvendii Radvendii deleted the 950-excluded-fields branch June 22, 2023 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants