EEP 71: clarification of type specs & generics

erlang · Aug 7, 2024 · b22417e · b22417e
1 parent 68a0a1b
commit b22417e
Showing 1 changed file with 152 additions and 0 deletions.
diff --git a/eeps/eep-0071.md b/eeps/eep-0071.md
@@ -0,0 +1,152 @@
+    Author: John Högberg <john(at)erlang(dot)org>, Ilya Klyuchnikov <ilya(dot)klyuchnikov(at)gmail(dot)com>
+    Status: Draft
+    Type: Standards Track
+    Created: 7-Aug-2024
+    Erlang-Version: OTP-28
+    Post-History:
+****
+EEP 71: Clarify the type documentation, especially regarding type variables
+----
+
+Abstract
+========
+
+Type variables are an often-used yet very misunderstood part of the type
+language. As `dialyzer` has been lax in enforcing their defined semantics owing
+to a few bugs, many users have been led to believe that they are simple generics
+as in most other languages. As a significant amount of code has been written
+with this in mind, OTP cannot fix the bugs that made `dialyzer` lax by accident
+without breaking the analysis of a lot of code.
+
+While that would be fine if `dialyzer` were the only tool out there, all the
+other code analyzers we know of interpret type variables as generics which would
+fragment the ecosystem should we elect to fix the bugs. One application might
+have been written with `eqWAlizer` in mind and another with `dialyzer`,
+preventing either tool from being applied to the whole release.
+
+As `dialyzer` is simply broken in this regard, this EEP aims to solve the
+problem by changing the definition of type variables from _equality constraints_
+to _generics_ (parametric polymorphism). We also propose several other changes
+to make the type language more useful and less ambiguous.
+
+Rationale
+=========
+
+As currently documented, variables in `-spec`s and `-type`s act just like in
+ordinary code. That is, if the variable `V` appears several times it has the
+same _value_ in all places. In a language like C++ or Java where said value is a
+type, this works pretty well:
+
+    template<typename V>
+    V foo(V a, V b) {
+       return a + b;
+    }
+
+For example, `foo(12, 34)` works because both arguments have the same type `int`
+(yielding `int` for `T`), but `foo(12, 34.0)` doesn't because `int` is not
+`double`.
+
+However, as the documentation for the Erlang type language says that repeated
+variables refer to the same _value at runtime_, tools like `gradualizer` and
+`eqWAlizer` are not free to treat them as generics. For example:
+
+    -spec my_add(T, T) -> T.
+    my_add(A, B) -> A + B.
+
+Here, the only possible values of `T` that will work are the various
+representations of zero: `foo(12, 34)` will fail because `12` is not `34`, and
+`foo(1, 1)` will fail because the result `2` is not `1`.
+
+This makes it impossible to declare that a function takes a value of a certain
+type and returns something of the same type (but not necessarily _same value_).
+This makes sense in `dialyzer` as values and types are the same thing in its
+paradigm[^1], but restricts what other type checkers can do; while the
+documentation states that the "extra information" given by type variables may be
+ignored, it leaves no room for other interpretations when they are taken into
+account.
+
+[^1]: In an untyped language we cannot simply say that the type of `1` is
+    `integer()` since `1 | foo` is just as much of a superset of `1` as
+    `integer()` is. The type of `1` is simply `1`.
+
+Changes
+----------------
+
+To solve this issue and make the type language less surprising, we propose the
+following changes:
+
+1. Redefine `::` as alias instead of "subtype of."
+
+   All currently known tools implement this interpretation[^2]. Should bounded
+   quantification be needed in the future, the "subtype of" operator can be
+   introduced later under the more widely used `<:` or `=<` syntaxes.
+
+   Multiple aliases under the same name are rejected, making the following illegal:
+
+       -spec multiple(X :: integer(), X :: integer()) -> atom().
+       -spec when_multiple(X) -> atom() when X :: integer(), X :: number().
+
+    To prevent breaking existing code as a result of this change, the compiler
+    will _not_ enforce the above, leaving that to the type checking tools (or
+    optionally a compiler flag).
+
+2.  Normalize types in the parser, substituting aliases and treating "annotated
+    types" the same as `when`. This makes the following signatures equivalent:
+
+        -spec xyzzy(A) -> term() when A :: number().
+        -spec waldo(B :: number()) -> term().
+        -spec fred(number()) -> term().
+
+    Where `B :: number()` is mere shorthand for `B ... when B :: number()`, and
+    all variants declare that the first argument is a `number()`.
+
+3. Treat type variables as generics rather than as equality constraints.
+
+   Example:
+
+   The current `lists:append` type spec is:
+
+       -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [T], List3 :: [T], T :: term().
+
+   With the defined semantics, `T :: term()` makes the `append` function to
+   refer to a top type. Instead, the following is preferred and more in line
+   with current type systems:
+
+       -spec append(List1, List2) -> List3 when List1 :: [T], List2 :: [U], List3 :: [T | U].
+
+4. Have function contracts (`-spec`) describe argument types as they go in.
+
+    A very surprising aspect of function contracts is that they describe the
+    types of the arguments _when the function returns successfully_, and do not
+    affect argument types _within_ the function. For example:
+
+        -spec foo(integer()) -> integer().
+        foo(X) ->
+            bar(X), %% X is not necessarily an integer here!
+            X + 1.
+
+    Most people take the above to mean that the type passed on to `bar/1` is
+    `integer()`, yet it can be whatever is passed to `foo/1` including values
+    that cause `foo/1` to fail. This often hides the fact that there is
+    unreachable code in callees such as `bar/1` here, and sometimes results in
+    cryptic warnings describing something the user believes could never happen.
+
+    This also makes it much easier for a user to see what the type of a variable
+    is at any point: just start with what it says in the `-spec`. At present,
+    the type of `X` depends entirely on how `X` got there which is not always
+    trivial to figure out.
+
+[^2]: `dialyzer` also does this as a consequence of the bugs mentioned in the abstract.
+
+Reference Implementation
+------------------------
+
+We will make a PR with changes in the documentation to reflect
+above once it is accepted
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.
+