[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191
+1,687
−437
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a WIP refactor of
mir_op
. I don't have time to finish it at the moment, but I'm posting this PR and including some notes here so it doesn't get lost. Currently it works on trivial examples likeoffset1
, but fails on more interesting ones likealgo_md5
. It mostly seems to be failing while trying to produce nonsensical casts, but there are also a lot of unimplementedCallee
cases in the newmir_op
that will surely cause other problems later on.This branch refactors the MIR rewrite generation pass (
rewrite::expr::mir_op
) to separateLTy
/TypeDesc
handling from the actual generation of casts and other rewrites. It dividesmir_op
into three separate passes: the first collects type and other metadata for each MIR node, the second determines which casts are needed to produce a well-typed program after type rewriting, and the third inserts the casts and any other necessary rewrites.These passes work on a representation called
SublocInfo
. A "subloc" or "node" is a piece of MIR at finer granularity than aLocation
. For example, given the statement_2 = Use(move _1)
, aSubLoc
path can refer to the whole statement, the destination place_2
, the rvalueUse(move_1)
, the operandmove _1
, or the place_1
. Each of these can have its ownSublocInfo
that describes its type and other information about the surrounding context or how it can be used.The three new passes in more detail:
SublocInfo
collection: This pass computes the "new type" of each node, which is the type it would have after the types of all defs and locals are rewritten to match theirLTy
s. This can produce inconsistent results, such as giving the LHS and RHS of an assignment different types. This pass also records other metadata, such as the access mode (imm or mut) forPlace
s.SublocInfo
typechecking: This pass checks for inconsistencies and computes the "expected type" of each node, which is the type it should have in order to make it usable in the surrounding context. By default, the node's expected type is identical to its new type, but it may be changed to resolve a type error. For example, in an inconsistent assignment (where the LHS and RHS have different new types), the expected type of the RHS will be set to match the new (and expected) type of the LHS. There are also some cases, mostly around special functions likeoffset
, where this pass will adjust a node's new type instead of its expected type.offset
. This is similar to the behavior of the existingmir_op
pass, but it's driven entirely bySublocInfo
entries, rather than directly consultingLTy
s.Advantages of the new design:
SublocInfo
s to determine whether it's an issue with the rewrite itself or withSublocInfo
generation.SublocInfo
collection phase interacts directly with analysis results (LTy
s). This means we could implement an alternate version of that pass with a different strategy for determining new types, while reusing all the rest of the rewriting machinery.Limitations:
Callee
they encounter.SublocInfo
passes to only request casts that the rewriting pass can handle.offset
to a subslice operation) and which should be left alone. Currently this is handled in a roundabout way: some of the inputs and/or outputs of the function are markedFIXED
in the analysis, so their new types are left as raw pointers, and the rewriting pass knows to skip the normal rewrite if it sees raw pointers there.