-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error message when contract makes certain rows illegal #1323
Conversation
stdlib/internals.ncl
Outdated
let conflicts = std.array.filter | ||
(fun field => std.array.elem field constr) | ||
(%fields% value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using standard library functions in the internals module makes me a bit scared, because this can relatively easily lead to infinite recursions. Because most standard library functions carry type annotations, these types get converted to contract checks when they get called, which causes subcontract
in the interpreter to get called. And if the constellation of types is slightly unlucky that ends up producing a recursion. I think in this case it would be fine, but internals.ncl
is special in this way.
Even if it's ugly, we should inline the definitions of filter
and elem
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point - I hadn't really considered this. There are 9 references to std.
in this file. Do you think it's worth rewriting all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the reference to std.contract.Equal
, I think that would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Matrix (thread), this is extra painful because internals.ncl
must currently be a syntactic record - i.e., it's not enough that it just evaluates to a record.
I proposed an alternative solution whereby we split standard library function implementations out into a module std.unsafe
, and then re-export them in the existing modules with type & contract annotations provided on top. This means we can (safely) call std.unsafe
functions from within internals
without worrying about infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spiked the std.unsafe
idea, and it's gonna be a bit more complex than I'd hoped (because, e.g., std.unsafe.array.length
gets inferred as Dyn
and then we can't unify that with std.array.length: forall a. Array a -> Number
). It might be possible to get the sort of API we want using codegen to avoid duplicating definitions, but that's probably not something we should rush into.
As such, we have two options:
- Leave these calls as-is, and tackle every call to
std
in this file at the same time, or - Fix these two callsites, and come back for the rest of the file once we've worked out a better solution.
I'm maybe leaning towards the first option, but I don't think I have strong feelings either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:
- write a single implementation that is statically typechecked
- export it either without any contract nor type (
std.unsafe
) or wrapped inside a type annotation (the currentstd
interface)
I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length
could be handled that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this issue for discussing this issue further. Can we resolve the thread on this PR and leave in std.
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since in this case we're not actually producing infinite recursions, I agree we should keep this as is in the interest of moving this PR forward 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it's not only about infinite recursion, but for performance as well. Contract do have a non-trivial impact on performance, and this is also why we try to refrain from firing contracts from within contracts in a non obvious way. For example, I believe we're also using %length%
instead of array.length
although there's no risk of recursion. This can be reworked later, so let's not block this PR on this, but just to give a bit more context. internals
are really low-level functions that are called implicitly, potentially quite a lot, by the interpreter, so we've been trying to keep them low level as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to compute those constraints as well during typechecking. This is probably not super costly, but it's a bit sad that we now duplicate the work in both contract()
and type_check()
. See typecheck::ConstrainFreshRRowsVar
and its implementation.
Those so-called "lack predicate" constraints are purely syntactical. If we use them in many places, I wonder if we shouldn't compute them earlier. Parsing time is a good candidate (so that we don't have to first produce an incomplete AST with Option
s that is then latter filled, but rather generate right away a complete structure). A forall
variable would have a name and a kind as before, together with an additional constraint
field or better named (must_not_contain
or must_lack
or lacks
or whatever) field which is a HashSet
or simply a Vec
, as you suggest, because I expect the most common lengths for those constraints to be between 1 and 5.
src/types.rs
Outdated
let mut maybe_constr: HashSet<Ident> = HashSet::new(); | ||
let mut to_be_checked: Vec<&Types> = vec![body]; | ||
while let Some(tys) = to_be_checked.pop() { | ||
match &tys.types { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a visitor pattern. Rather than burying this code in this specific function, I wonder if we could separate the tree-walking part into a generic method of Types
(it looks like a forM
or something, at least from a distance) and simply keep the pattern matching logic here (but maybe it doesn't fit such a pattern, because e.g. of shadowing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good thought. Hmm... I'm not sure if you meant something else by visitor pattern, but I tried implementing it as an iterator, and in order to allow the shadowed variable check, it had to look something like this:
pub struct TypesIterator<'a> {
pub(crate) todo: Vec<&'a Types>,
pub(crate) should_recurse: Option<Box<dyn Fn(&'a Types) -> bool>>,
}
impl<'a> Iterator for TypesIterator<'a> {
type Item = &'a Types;
fn next(&mut self) -> Option<Self::Item> {
self.todo.pop().map(|tys| {
let should_recurse = match &self.should_recurse {
None => true,
Some(f) => f(tys),
};
if should_recurse {
match &tys.types {
TypeF::Arrow(s, t) => {
self.todo.push(s);
self.todo.push(t);
}
TypeF::Forall { body, .. } => {
self.todo.push(body);
}
TypeF::Record(rrows) => {
for ritem in rrows.iter() {
if let RecordRowsIteratorItem::Row(row) = ritem {
self.todo.push(row.types);
}
}
}
TypeF::Dict {
type_fields,
flavour,
} => {
// XXX: is this the right semantics? I'm not sure whether we should recurse
// into contracts or not
if flavour == &DictTypeFlavour::Type {
self.todo.push(type_fields);
}
}
TypeF::Array(t) => {
self.todo.push(t);
}
_ => (),
}
}
tys
})
}
}
impl Types {
pub fn iter(&self, should_recurse: Option<impl Fn(&Types) -> bool + 'static>) -> TypesIterator {
TypesIterator {
todo: vec![self],
should_recurse: should_recurse.map(|x| Box::new(x) as Box<dyn Fn(&Types) -> bool>),
}
}
}
And then you call it like
let var_ = var.clone();
let ignore_shadowed = move |tys: &Types| match &tys.types {
TypeF::Forall { var, .. } if var == &var_ => false,
_ => true,
};
for tys in body.iter(Some(ignore_shadowed)) {
...
}
Advantages:
- clear separation of concerns
Problems:
- It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?
- Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in
None
, because it needs a concrete type there.
One alternative is a much more specialized function, like iter_forall(body: &Types, var: Ident)
which takes into account shadowing. But that feels really specific...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was thinking of something like this. Even if it's not reused, I feel it separates concerns and makes the code more readable (the call site to this method is only concerned with the logic of the processing; if you want to see how the iteration work, you can then go to the definition of iter()
).
It's not clear that this is actually general enough. What if someone wants to only recurse into the domain of a function type for some reason?
This is a good question, which is hard to answer now. But, what we can do is to give a specific name to this function, make the documentation clear enough. If one day we need something more general, it shouldn't be too hard to do, and to replace the function you wrote by a specialized call to the general one.
Dealing with these closure types seems really thorny. It took me a while to get the types right, and even then I'm not sure how to get it to work if you pass in None, because it needs a concrete type there.
I'm not sure to see why we need 'static
here, by the way. And we can probably do with FnMut
instead of Fn
. But, as far as the Option
is concerned, I would just get rid of it. None
is semantically equivalent to |_| true
, which is really not that longer to write for the caller, while as you mentioned, it is annoying with generics (and require an additional pattern matching). I'm relatively confident that the optimized code will be as efficient with |_| true
as with None
, thanks to monorphization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of moot now, because I moved the code into the parsing stage, and built it into the recursion that's already present there for determining var_kind
stdlib/internals.ncl
Outdated
let conflicts = std.array.filter | ||
(fun field => std.array.elem field constr) | ||
(%fields% value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution to this problem could be to have a special type annotation that doesn't generate a corresponding contract at runtime. I guess those would need to be forbidden outside of the stdlib (enforced by the parser), but that would allow to:
- write a single implementation that is statically typechecked
- export it either without any contract nor type (
std.unsafe
) or wrapped inside a type annotation (the currentstd
interface)
I don't know if that would work so well with mutually recursive functions, but at least self-contained ones such as length
could be handled that way.
Yeah, we based this implementation on that one. It would be nice if they could be unified.
I think we'll have to wait until
i.e. |
Yes,
Either that, or putting the lacks predicate inside |
Oh, you're right. We implemented it for enums but that doesn't really apply. Type checking verifies that
works fine. As an aside, I was poking around while testing that and found
Fails to type check, but will work as a contract ( |
In general, when you introduce a polymorphic type variable (be it a normal type variable, or a row type variable), the variable can't unify with most of types. For example:
will rightly fail, because the Under this interpretation, it makes sense I think that Subtyping is a whole can of worm, and the point of row polymorphism is to get the same kind of flexibility while staying in a purely classical polymorphic approach. We really don't want to introduce subtyping (at least not for row types). I believe there is an obscure primop which can widen a row type in Nickel already, I wonder if we can't simply infer open types for enum tags. That is, infer |
When typechecking, I think we're already doing that. In Term::Enum(id) => {
let row = state.table.fresh_erows_uvar();
unify(state, &ctxt, ty, mk_uty_enum!(*id; row))
.map_err(|err| err.into_typecheck_err(state, rt.pos))
} |
So @vkleen is right, my bad. We already do this. We looked at the example and deduced that the issue was actually the converse : the type of We can thus actually fix the example with the widening operator (this is an undocumented, prototype legacy - the syntax in particular is strange, as it takes an identifier as a parameter, but interpret it as a symbol, a bit like quoting in Lisp):
One possibility would be to fix the odd syntax (and name, probably) of |
There's something I don't understand. We seem to have concluded that enums don't have the same constraint / excluded behaviour that records do, but all of the machinery is there in the type checking (see |
d16f5a9
to
5d1b89c
Compare
In rebasing, I had to contend with f8c8a02 which actually relies on At first I was going to use I think ultimately we may want to store the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of comments, but LGTM otherwise.
src/parser/uniterm.rs
Outdated
/// Return the current var_kind. Default to `VarKind::Type` if it's unused as in | ||
/// `forall a. Number` | ||
// TODO: optimization: when var_kind is called, there are actually no other references to this | ||
// VarKind. We should be able to architect this so there's no clone here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can consume the value inside an option by using take
, but that is probably requiring a &mut
ref. Something like:
pub fn take_var_kind(&mut self) -> VarKind {
self.0.borrow_mut().take().unwrap_or(VarKind::Type)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... So in fact this does not require &mut self
, because we're borrowing the inside mutably. But what happens is that we take
it out of the Option
, and leave Nothing
behind. Not quite what I was imagining (taking the whole key-value pair out of the Environment
), and if this ends up causing trouble it will likely produce worse error messages, but it does work.
fixes #950 by calculating the fields disallowed in a parameterized tail e.g. in `forall r. { x : Foo ; r }` `x` cannot be present in `r`
we check for constraints in contract checking and type checking. by moving it to the parser we only have to do it in one place. also, removed constraint checking for enum rows, which doesn't seem to have been doing anything and also we don't want it. `[| 'x ; e |]` does not preclude `'x` showing up in `e` added in VarKindDiscriminant for those places where we do actually just want the enum type. This is mostly for error reporting.
b8e8aa4
to
ebee04a
Compare
addresses #950
worked on with @jneem @matthew-healy
After we finished working on it, I changed two things
We need to recurse into the type declarations inside a record e.g.
forall r. { x : { y : Number ; r } ; r }
makesy
inr
illegal.I rewrote it to be imperative. This has three advantages I see:
HashSet
sOne disadvantage is that we always need two
HashSet
s, as well asto_be_checked: Vec
. But maybe we could get rid of any allocations by moving things around a bit.The original recursive implementation is there if you just look at the first commit.
TODO for this PR:
subcontract()
function rather long. Maybe we should move it to its own function. Or maybe not! I don't know what the norms are for this code base.