Auto merge of #23109 - nikomatsakis:closure-region-hierarchy, r=pnkfelix

Adjust internal treatment of the region hierarchy around closures. Work towards #3696. r? @pnkfelix
rust-lang · Apr 1, 2015 · d528aa9 · d528aa9
2 parents 8943653 + f15813d
commit d528aa9
Show file tree

Hide file tree

Showing 4 changed files with 218 additions and 197 deletions.
diff --git a/src/librustc/middle/infer/region_inference/README.md b/src/librustc/middle/infer/region_inference/README.md
@@ -249,114 +249,61 @@ there is a reference created whose lifetime does not enclose
 the borrow expression, we must issue sufficient restrictions to ensure
 that the pointee remains valid.
 
-## Adding closures
-
-The other significant complication to the region hierarchy is
-closures. I will describe here how closures should work, though some
-of the work to implement this model is ongoing at the time of this
-writing.
-
-The body of closures are type-checked along with the function that
-creates them. However, unlike other expressions that appear within the
-function body, it is not entirely obvious when a closure body executes
-with respect to the other expressions. This is because the closure
-body will execute whenever the closure is called; however, we can
-never know precisely when the closure will be called, especially
-without some sort of alias analysis.
-
-However, we can place some sort of limits on when the closure
-executes.  In particular, the type of every closure `fn:'r K` includes
-a region bound `'r`. This bound indicates the maximum lifetime of that
-closure; once we exit that region, the closure cannot be called
-anymore. Therefore, we say that the lifetime of the closure body is a
-sublifetime of the closure bound, but the closure body itself is unordered
-with respect to other parts of the code.
-
-For example, consider the following fragment of code:
-
-    'a: {
-         let closure: fn:'a() = || 'b: {
-             'c: ...
-         };
-         'd: ...
-    }
-
-Here we have four lifetimes, `'a`, `'b`, `'c`, and `'d`. The closure
-`closure` is bounded by the lifetime `'a`. The lifetime `'b` is the
-lifetime of the closure body, and `'c` is some statement within the
-closure body. Finally, `'d` is a statement within the outer block that
-created the closure.
-
-We can say that the closure body `'b` is a sublifetime of `'a` due to
-the closure bound. By the usual lexical scoping conventions, the
-statement `'c` is clearly a sublifetime of `'b`, and `'d` is a
-sublifetime of `'d`. However, there is no ordering between `'c` and
-`'d` per se (this kind of ordering between statements is actually only
-an issue for dataflow; passes like the borrow checker must assume that
-closures could execute at any time from the moment they are created
-until they go out of scope).
-
-### Complications due to closure bound inference
-
-There is only one problem with the above model: in general, we do not
-actually *know* the closure bounds during region inference! In fact,
-closure bounds are almost always region variables! This is very tricky
-because the inference system implicitly assumes that we can do things
-like compute the LUB of two scoped lifetimes without needing to know
-the values of any variables.
-
-Here is an example to illustrate the problem:
-
-    fn identify<T>(x: T) -> T { x }
-
-    fn foo() { // 'foo is the function body
-      'a: {
-           let closure = identity(|| 'b: {
-               'c: ...
-           });
-           'd: closure();
-      }
-      'e: ...;
-    }
-
-In this example, the closure bound is not explicit. At compile time,
-we will create a region variable (let's call it `V0`) to represent the
-closure bound.
-
-The primary difficulty arises during the constraint propagation phase.
-Imagine there is some variable with incoming edges from `'c` and `'d`.
-This means that the value of the variable must be `LUB('c,
-'d)`. However, without knowing what the closure bound `V0` is, we
-can't compute the LUB of `'c` and `'d`! Any we don't know the closure
-bound until inference is done.
-
-The solution is to rely on the fixed point nature of inference.
-Basically, when we must compute `LUB('c, 'd)`, we just use the current
-value for `V0` as the closure's bound. If `V0`'s binding should
-change, then we will do another round of inference, and the result of
-`LUB('c, 'd)` will change.
-
-One minor implication of this is that the graph does not in fact track
-the full set of dependencies between edges. We cannot easily know
-whether the result of a LUB computation will change, since there may
-be indirect dependencies on other variables that are not reflected on
-the graph. Therefore, we must *always* iterate over all edges when
-doing the fixed point calculation, not just those adjacent to nodes
-whose values have changed.
-
-Were it not for this requirement, we could in fact avoid fixed-point
-iteration altogether. In that universe, we could instead first
-identify and remove strongly connected components (SCC) in the graph.
-Note that such components must consist solely of region variables; all
-of these variables can effectively be unified into a single variable.
-Once SCCs are removed, we are left with a DAG.  At this point, we
-could walk the DAG in topological order once to compute the expanding
-nodes, and again in reverse topological order to compute the
-contracting nodes. However, as I said, this does not work given the
-current treatment of closure bounds, but perhaps in the future we can
-address this problem somehow and make region inference somewhat more
-efficient. Note that this is solely a matter of performance, not
-expressiveness.
+## Modeling closures
+
+Integrating closures properly into the model is a bit of
+work-in-progress. In an ideal world, we would model closures as
+closely as possible after their desugared equivalents. That is, a
+closure type would be modeled as a struct, and the region hierarchy of
+different closure bodies would be completely distinct from all other
+fns. We are generally moving in that direction but there are
+complications in terms of the implementation.
+
+In practice what we currently do is somewhat different. The basis for
+the current approach is the observation that the only time that
+regions from distinct fn bodies interact with one another is through
+an upvar or the type of a fn parameter (since closures live in the fn
+body namespace, they can in fact have fn parameters whose types
+include regions from the surrounding fn body). For these cases, there
+are separate mechanisms which ensure that the regions that appear in
+upvars/parameters outlive the dynamic extent of each call to the
+closure:
+
+1. Types must outlive the region of any expression where they are used.
+   For a closure type `C` to outlive a region `'r`, that implies that the
+   types of all its upvars must outlive `'r`.
+2. Parameters must outlive the region of any fn that they are passed to.
+
+Therefore, we can -- sort of -- assume that any region from an
+enclosing fns is larger than any region from one of its enclosed
+fn. And that is precisely what we do: when building the region
+hierarchy, each region lives in its own distinct subtree, but if we
+are asked to compute the `LUB(r1, r2)` of two regions, and those
+regions are in disjoint subtrees, we compare the lexical nesting of
+the two regions.
+
+*Ideas for improving the situation:* (FIXME #3696) The correctness
+argument here is subtle and a bit hand-wavy. The ideal, as stated
+earlier, would be to model things in such a way that it corresponds
+more closely to the desugared code. The best approach for doing this
+is a bit unclear: it may in fact be possible to *actually* desugar
+before we start, but I don't think so. The main option that I've been
+thinking through is imposing a "view shift" as we enter the fn body,
+so that regions appearing in the types of fn parameters and upvars are
+translated from being regions in the outer fn into free region
+parameters, just as they would be if we applied the desugaring. The
+challenge here is that type inference may not have fully run, so the
+types may not be fully known: we could probably do this translation
+lazilly, as type variables are instantiated. We would also have to
+apply a kind of inverse translation to the return value. This would be
+a good idea anyway, as right now it is possible for free regions
+instantiated within the closure to leak into the parent: this
+currently leads to type errors, since those regions cannot outlive any
+expressions within the parent hierarchy. Much like the current
+handling of closures, there are no known cases where this leads to a
+type-checking accepting incorrect code (though it sometimes rejects
+what might be considered correct code; see rust-lang/rust#22557), but
+it still doesn't feel like the right approach.
 
 ### Skolemization
 

diff --git a/src/librustc/middle/infer/region_inference/mod.rs b/src/librustc/middle/infer/region_inference/mod.rs
@@ -760,26 +760,25 @@ impl<'a, 'tcx> RegionVarBindings<'a, 'tcx> {
             // at least as big as the block fr.scope_id".  So, we can
             // reasonably compare free regions and scopes:
             let fr_scope = fr.scope.to_code_extent();
-            match self.tcx.region_maps.nearest_common_ancestor(fr_scope, s_id) {
+            let r_id = self.tcx.region_maps.nearest_common_ancestor(fr_scope, s_id);
+
+            if r_id == fr_scope {
               // if the free region's scope `fr.scope_id` is bigger than
               // the scope region `s_id`, then the LUB is the free
               // region itself:
-              Some(r_id) if r_id == fr_scope => f,
-
+              f
+            } else {
               // otherwise, we don't know what the free region is,
               // so we must conservatively say the LUB is static:
-              _ => ReStatic
+              ReStatic
             }
           }
 
           (ReScope(a_id), ReScope(b_id)) => {
             // The region corresponding to an outer block is a
             // subtype of the region corresponding to an inner
             // block.
-            match self.tcx.region_maps.nearest_common_ancestor(a_id, b_id) {
-              Some(r_id) => ReScope(r_id),
-              _ => ReStatic
-            }
+            ReScope(self.tcx.region_maps.nearest_common_ancestor(a_id, b_id))
           }
 
           (ReFree(ref a_fr), ReFree(ref b_fr)) => {
@@ -866,9 +865,10 @@ impl<'a, 'tcx> RegionVarBindings<'a, 'tcx> {
                 // is the scope `s_id`.  Otherwise, as we do not know
                 // big the free region is precisely, the GLB is undefined.
                 let fr_scope = fr.scope.to_code_extent();
-                match self.tcx.region_maps.nearest_common_ancestor(fr_scope, s_id) {
-                    Some(r_id) if r_id == fr_scope => Ok(s),
-                    _ => Err(ty::terr_regions_no_overlap(b, a))
+                if self.tcx.region_maps.nearest_common_ancestor(fr_scope, s_id) == fr_scope {
+                    Ok(s)
+                } else {
+                    Err(ty::terr_regions_no_overlap(b, a))
                 }
             }
 
@@ -934,10 +934,13 @@ impl<'a, 'tcx> RegionVarBindings<'a, 'tcx> {
         // it. Otherwise fail.
         debug!("intersect_scopes(scope_a={:?}, scope_b={:?}, region_a={:?}, region_b={:?})",
                scope_a, scope_b, region_a, region_b);
-        match self.tcx.region_maps.nearest_common_ancestor(scope_a, scope_b) {
-            Some(r_id) if scope_a == r_id => Ok(ReScope(scope_b)),
-            Some(r_id) if scope_b == r_id => Ok(ReScope(scope_a)),
-            _ => Err(ty::terr_regions_no_overlap(region_a, region_b))
+        let r_id = self.tcx.region_maps.nearest_common_ancestor(scope_a, scope_b);
+        if r_id == scope_a {
+            Ok(ReScope(scope_b))
+        } else if r_id == scope_b {
+            Ok(ReScope(scope_a))
+        } else {
+            Err(ty::terr_regions_no_overlap(region_a, region_b))
         }
     }
 }