Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sem all call nodes in generic type bodies + many required fixes #23983

Merged
merged 21 commits into from
Aug 20, 2024

Conversation

metagn
Copy link
Collaborator

@metagn metagn commented Aug 20, 2024

fixes #23406, closes #23854, closes #23855 (test code of both compiles but separate issue exists), refs #23432, follows #23411

In generic bodies, previously all regular nkCall nodes like foo(a, b) were directly treated as generic statements and delayed immediately, but other call kinds like a.foo(b), foo a, b etc underwent typechecking before making sure they have to be delayed, as implemented in #22029. Since the behavior for nkCall was slightly buggy (as in #23406), the behavior for all call kinds is now to call semTypeExpr.

However the vast majority of calls in generic bodies out there are nkCall, and while there isn't a difference in the expected behavior, this exposes many issues with the implementation started in #22029 given how much more code uses it now. The portion of these issues that CI has caught are fixed in this PR but it's possible there are more.

  1. Deref expressions, dot expressions and calls to dot expressions now handle and propagate tyFromExpr. This is most of the changes in semexprs.
  2. For deref expressions to work in typeof, a new type flag tfNonConstExpr is added for tyFromExpr that calls semExprWithType with efInTypeof on the expression instead of semConstExpr. This type flag is set for every tyFromExpr type of a node that prepareNode encounters, so that the node itself isn't evaluated at compile time when just trying to get the type of the node.
  3. Unresolved static types matching static parameters is now treated the same as unresolved generic types matching typedesc parameters in generic type bodies, it causes a failed match which delays the call instantiation.
  4. typedesc parameters now reject all types containing unresolved generic types like seq[T], not just generic param types by themselves. (using containsGenericType)
  5. semgnrc now doesn't leave generic param symbols it encounters in generic type contexts as just identifiers, and instead turns them into symbol nodes. Normally in generic procs, this isn't a problem since the generic param symbols will be provided again at instantiation time (and in fact creating symbol nodes causes issues since seminst doesn't actually instantiate proc body node types).
    But generic types can try to be instantiated early in sigmatch which will give an undeclared identifier error when the param is not provided. Nodes in generic types (specifically in tyFromExpr which should be the only use for semGenericStmt) undergo full generic type instantiation with prepareNode, so there is no issue of these symbols remaining as uninstantiated generic types.
  6. prepareNode now has more logic for which nodes to avoid instantiating.
    Subscripts and subscripts turned into calls to [] by semgnrc need to avoid instantiating the first operand, since it may be a generic body type like Generic in an expression like Generic[int].
    Dot expressions cannot instantiate their RHS as it may be a generic proc symbol or even an undeclared identifier for generic param fields, but have to instantiate their LHS, so calls and subscripts need to still instantiate their first node if it's a dot expression.
    This logic still isn't perfect and needs the same level of detail as in semexprs for which nodes can be left as "untyped" for overloading/dot exprs/subscripts to handle, but should handle the majority of cases.

Also the efDetermineType requirement for which calls become tyFromExpr is removed and as a result efDetermineType is entirely unused again.

@Araq Araq merged commit ab18962 into nim-lang:devel Aug 20, 2024
18 checks passed
Copy link
Contributor

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from ab18962

Hint: mm: orc; opt: speed; options: -d:release
173583 lines; 8.013s; 654.484MiB peakmem

@arnetheduck
Copy link
Contributor

is this appropriate to backport to 2.0?

@metagn
Copy link
Collaborator Author

metagn commented Aug 23, 2024

In principle yes, it's not too crazy since it's mostly following up #22029, but if you ask me it's likely to cause some regressions, at least without #24005 which fixes some issues here (dot expressions need a universally working c.inGenericContext > 0), but could cause even more regressions due to the gravity of the changes.

Specifically 2 of the changes I think could cause issues, 1. the new dotexpr/subscript instantiation behavior, 2. the semgnrc change (which also needs a different diff for the opensym block). If the first one causes issues we'll need to fix it in devel too anyway, the 2nd one just needs the rest of the compiler to be ready for it which I think is true in devel and I can't think of anything the 2.0 branch doesn't have that would change it but I'm not sure. There are also minor things that maybe could cause issues but I don't really think so, like tfNonConstExpr and sigmatch using containsGenericType for the typedesc check.

It should also depend on another followup to #22029, #23863, which isn't backported yet. There might be more unrelated bugfixes that make it work.

In general I'm hesitant about the idea of backporting PRs like this where the goal is to make the compiler more "sensible" at the cost of stability, but maybe it's paid off for people before. In this case I think the benefit is little unless we also want to backport #24005 which I don't know about yet.

Araq pushed a commit that referenced this pull request Aug 26, 2024
…n fixes (#24005)

fixes #4228, fixes #4990, fixes #7006, fixes #7008, fixes #8406, fixes
#8551, fixes #11112, fixes #20027, fixes #22647, refs #23854 and #23855
(remaining issue fixed), refs #8545 (works properly now with
`cast[static[bool]]` changed to `cast[bool]`), refs #22342 and #22607
(disabled tests added), succeeds #23194

Parameter and return type nodes in generic procs now undergo the same
`inGenericContext` treatment that nodes in generic type bodies do. This
allows many of the fixes in #22029 and followups to also apply to
generic proc signatures. Like #23983 however this needs some more
compiler fixes, but this time mostly in `sigmatch` and type
instantiations.

1. `tryReadingGenericParam` no longer treats `tyCompositeTypeClass` like
a concrete type anymore, so expressions like `Foo.T` where `Foo` is a
generic type don't look for a parameter of `Foo` in non-generic code
anymore. It also doesn't generate `tyFromExpr` in non-generic code for
any generic LHS. This is to handle a very specific case in `asyncmacro`
which used `FutureVar.astToStr` where `FutureVar` is generic.
2. The `tryResolvingStaticExpr` call when matching `tyFromExpr` in
sigmatch now doesn't consider call nodes in general unresolved, only
nodes with `tyFromExpr` type, which is emitted on unresolved expressions
by increasing `c.inGenericContext`. `c.inGenericContext == 0` is also
now required to attempt instantiating `tyFromExpr`. So matching against
`tyFromExpr` in proc signatures works in general now, but I'm
speculating it depends on constant folding in `semExpr` for statics to
match against it properly.
3. `paramTypesMatch` now doesn't try to change nodes with `tyFromExpr`
type into `tyStatic` type when fitting to a static type, because it
doesn't need to, they'll be handled the same way (this was a workaround
in place of the static type instantiation changes, only one of the
fields in the #22647 test doesn't work with it).
4. `tyStatic` matching now uses `inferStaticParam` instead of just range
type matching, so `Foo[N div 2]` can infer `N` in the same way `array[N
div 2, int]` can. `inferStaticParam` also disabled itself if the
inferred static param type already had a node, but `makeStaticExpr`
generates static types with unresolved nodes, so we only disable it if
it also doesn't have a binding. This might not work very well but the
static type instantiation changes should really lower the amount of
cases where it's encountered.
5. Static types now undergo type instantiation. Previously the branch
for `tyStatic` in `semtypinst` was a no-op, now it acts similarly to
instantiating any other type with the following differences:
- Other types only need instantiation if `containsGenericType` is true,
static types also get instantiated if their value node isn't a literal
node. Ideally any value node that is "already evaluated" should be
ignored, but I'm not sure of a better way to check this, maybe if
`evalConstExpr` emitted a flag. This is purely for optimization though.
- After instantiation, `semConstExpr` is called on the value node if
`not cl.allowMetaTypes` and the type isn't literally a `static` type.
Then the type of the node is set to the base type of the static type to
deal with `semConstExpr` stripping abstract types.
We need to do this because calls like `foo(N)` where `N` is `static int`
and `foo`'s first parameter is just `int` do not generate `tyFromExpr`,
they are fully typed and so `makeStaticExpr` is called on them, giving a
static type with an unresolved node.
Araq pushed a commit that referenced this pull request Aug 28, 2024
…#24018)

updated version of #22193

After #22029 and the followups #23983 and #24005 which fixed issues with
it, `tyFromExpr` no longer match any proc params in generic type bodies
but delay all non-matching calls until the type is instantiated.
Previously the mechanism `fauxMatch` was used to pretend that any
failing match against `tyFromExpr` actually matched, but prevented the
instantiation of the type until later.

Since this mechanism is not needed anymore for `tyFromExpr`, it is now
only used for `tyError` to prevent cascading errors and changed to a
bool field for simplicity. A change in `semtypes` was also needed to
prevent calling `fitNode` on default param values resolving to type
`tyFromExpr` in generic procs for params with non-generic types, as this
would try to coerce the expression into a concrete type when it can't be
instantiated yet.

The aliases `tyProxy` and `tyUnknown` for `tyError` and `tyFromExpr` are
also removed for uniformity.
narimiran pushed a commit that referenced this pull request Sep 16, 2024
fixes #23406, closes #23854, closes #23855 (test code of both compiles
but separate issue exists), refs #23432, follows #23411

In generic bodies, previously all regular `nkCall` nodes like `foo(a,
b)` were directly treated as generic statements and delayed immediately,
but other call kinds like `a.foo(b)`, `foo a, b` etc underwent
typechecking before making sure they have to be delayed, as implemented
in #22029. Since the behavior for `nkCall` was slightly buggy (as in

However the vast majority of calls in generic bodies out there are
`nkCall`, and while there isn't a difference in the expected behavior,
this exposes many issues with the implementation started in #22029 given
how much more code uses it now. The portion of these issues that CI has
caught are fixed in this PR but it's possible there are more.

1. Deref expressions, dot expressions and calls to dot expressions now
handle and propagate `tyFromExpr`. This is most of the changes in
`semexprs`.
2. For deref expressions to work in `typeof`, a new type flag
`tfNonConstExpr` is added for `tyFromExpr` that calls `semExprWithType`
with `efInTypeof` on the expression instead of `semConstExpr`. This type
flag is set for every `tyFromExpr` type of a node that `prepareNode`
encounters, so that the node itself isn't evaluated at compile time when
just trying to get the type of the node.
3. Unresolved `static` types matching `static` parameters is now treated
the same as unresolved generic types matching `typedesc` parameters in
generic type bodies, it causes a failed match which delays the call
instantiation.
4. `typedesc` parameters now reject all types containing unresolved
generic types like `seq[T]`, not just generic param types by themselves.
(using `containsGenericType`)
5. `semgnrc` now doesn't leave generic param symbols it encounters in
generic type contexts as just identifiers, and instead turns them into
symbol nodes. Normally in generic procs, this isn't a problem since the
generic param symbols will be provided again at instantiation time (and
in fact creating symbol nodes causes issues since `seminst` doesn't
actually instantiate proc body node types).
But generic types can try to be instantiated early in `sigmatch` which
will give an undeclared identifier error when the param is not provided.
Nodes in generic types (specifically in `tyFromExpr` which should be the
only use for `semGenericStmt`) undergo full generic type instantiation
with `prepareNode`, so there is no issue of these symbols remaining as
uninstantiated generic types.
6. `prepareNode` now has more logic for which nodes to avoid
instantiating.
Subscripts and subscripts turned into calls to `[]` by `semgnrc` need to
avoid instantiating the first operand, since it may be a generic body
type like `Generic` in an expression like `Generic[int]`.
Dot expressions cannot instantiate their RHS as it may be a generic proc
symbol or even an undeclared identifier for generic param fields, but
have to instantiate their LHS, so calls and subscripts need to still
instantiate their first node if it's a dot expression.
This logic still isn't perfect and needs the same level of detail as in
`semexprs` for which nodes can be left as "untyped" for overloading/dot
exprs/subscripts to handle, but should handle the majority of cases.

Also the `efDetermineType` requirement for which calls become
`tyFromExpr` is removed and as a result `efDetermineType` is entirely
unused again.

(cherry picked from commit ab18962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants