-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: initial work on general count reconstruction #99992
Conversation
Implements a Gauss-Seidel solver for cases where method have irreducible loops.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Currently this kicks in whenever the initial profile is inconsistent... which turns out to be fairly often. Doing a "blend" may be too radical, we should perhaps do a "repair". |
for (unsigned i = m_dfsTree->GetPostOrderCount(); i != 0; i--) | ||
{ | ||
BasicBlock* block = m_dfsTree->GetPostOrder(i - 1); | ||
ComputeBlockWeight(block); | ||
} | ||
|
||
m_approximate = (m_cappedCyclicProbabilities) || (m_improperLoopHeaders > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there can be cycles in the flow graph even without any improper loop headers or natural loops (it's the reason why I originally removed m_improperLoopHeaders
from FlowGraphNaturalLoops
). #96153 has some more information. I'm guessing it doesn't affect whether or not the computation is an approximation here because those cases happen only due to exceptional flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we assume exceptional flow is sufficiently rare that any cycle it might form can be ignored. Also we don't model flow in and out of handlers, except for normally invoked finallys.
This might need some tweaks when we run this later after the importer rewires the flow.
Looks like failures above are all timeouts. I think it's unrelated, SPMI TP doesn't show much impact. |
|
Numeric Solvers for Profile Count ReconstructionIt may not be readily apparent how count reconstruction works. Perhaps these notes will shed some light on things. In our flowgraph model we assume that the edge likelihoods are trustworthy and well formed (meaning each edge's likelihood is in [0,1] and the sum of all likelihoods for a block's successor edges is 1). The appeal of edge well-formedness is easy to check and relatively easy to maintain during various optimizations. It is a local property. We will use By contrast, block weight consistency requires that the flow into a block be balanced by the flow out of a block. It is a global property and harder to maintain during optimizations. It may also not be true initially. We will use where the LHS is flow in and the RHS is flow out of block so we can restate this as saying the external flow plus the flow into the block must equal the block weight: The goal of this work is to explore methods for reconstructing a set of consistent block weights General SolutionThe above can be summarized in matrix-vector form as where to be able to express the sum of incoming flow as a standard matrix-vector product we have: (that is, in and this can be solved (in principle) for For example, given the following graph with edge likelihoods a shown: we have Note each column save the last sums to 1.0, representing the fact that the outgoing likelihoods from each block must sum to 1.0, unless there are no successors. Thus and so (details of computing the inverse left as exercise for the reader) Note the elements of If we feed 6 units of flow into A, we have or graphically However, explicit computation of the inverse of a matrix is computationally expensive. Also note (though it's not fully obvious from such a small example) that the matrix So solution techniques that can leverage sparseness are of particular interest. A More Practical SolutionNote the matrix If we further restrict ourselves to the case where Such matrices are known as M-matrices. It is well known that for an M-matrix This gives rise to a simple iterative procedure for computing an approximate value of where we can achieve any desired precision for Intuitively this should make sense, we are effectively pouring weight into the entry block(s) and letting the weights flow around in the graph until they reach a fixed point. If we do this for the example above, we get the following sequence of values for and the process converges to the weights found using the inverse. However convergence is fairly slow. Classically this approach is known as Jacobi's method. At each iterative step, the new values are based only on the old values. Jacobi's MethodIf you read the math literature on iterative solvers, Jacobi's method is often described as follows. Given a linear system And provided that In our case as we derived above. As an alternative we could split With that splitting, so as or in our block weight and edge likelihood notation Intuitively this reads: the new value of node On Convergence and StabilityWhile the iterative method above is guaranteed to converge when Here the spectral radius it is also worth noting that for synthesis the matrix Accelerating Convergence I: Gauss-Seidel and Reverse PostorderIt's also well-known that Gauss-Seidel iteration often converges faster than Jacobi iteration. Here instead of always using the old iteration values, we try and use the new iteration values that are available, where we presume each update happens in order of increasing or again in our notation In the above scheme the order of visiting successive blocks is fixed unspecified, and (in principle) any order can be used. But by using a reverse postorder to index the blocks, we can ensure a maximal amount of forward propagation per iteration. Note that if a block has an incoming edge from a node that appears later in the reverse postorder, that block is a loop header. If we do, that the code above nicely corresponds to our notion of forward and backward edges in the RPO: Note because of the order of reads and writes, On the example above this results in: So it is converging about twice as fast. As with the Jacobi method one can re-express this as a splitting and determine an iteration matrix Accelerating Convergence II: Cyclic ProbabilitiesA flow graph is reducible (or is said to have reducible loops) if every cycle in the graph has a block in the cycle that dominates the other blocks in the cycle. We will call such cycles natural loops, distinguished by their entry blocks. For reducible loops we can compute the amount by which they amplify flow using a technique described by Wu and Larus: given a loop head If we add this refinement to our algorithm we end up with: the second clause includes both blocks without any back edges and blocks with back edges that are not headers of natural loops. On an example like the one above this converges in one pass. If any One can imagine that if we cap some Since the remainder of the JIT is going to have to cope with lack of global balance anyways (recall it is hard to preserve) for now we are going to ty and tolerate reconstruction inconsistencies. The algorithm described above is implemented in the code as the Cycles That Are Not Natural Loops, More Sophisticated Solvers, and Deep NestsIf the flow graph has cycles that are not natural loops (irreducible loops) the above computations will converge but again may converge very slowly. On a sample of about 500 graphs with irreducible loops the modified Gauss-Seidel approach above required more than 20 iterations in 120 cases and more than 50 iterations in 70 cases, with worst-case around 500 iterations. SOR is a classic convergence altering technique, but unfortunately, for M-Matrices SOR can only safely be used to slow down convergence. There does not seem to be a good analog of It's possible that more sophisticated solution techniques like BiCGstab or CGS might be worth consideration. Or perhaps a least-squares solution, if we're forced to be approximate, to try and minimize the overall approximation error. In very deep loop nests even ReferencesCarl D. Meyer. Matrix Analysis and Applied Linear Algebra, in particular section 7.10. Nick Higham. What is an M-Matrix? Youfeng Wu and James R. Larus. Static branch frequency and program profile analysis, Micro-27 (1994). |
@jakobbotsch ptal Think repair works better than blend here. Fair number of diffs, likely from cases where PGO counts were not consistent (and likely much of that from the approximate count probes). |
Some notes:
|
Yeah I will copy this writeup somewhere. This is a standard iterative linear equation solver; the only real nuance is using RPO to ensure maximum forward propagation (hopefully a familiar concept to anyone working on the jit) and then using the cyclic probabilities. There's also a fairly obvious intuitive explanation, we pour some counts in the top and iteratively flow them around using the edge likelihoods until we reach a fixed point or run out of iterations. Most of what's above is just an explanation for why this works, why it might take a long time, and why there are cases it can't solve exactly. I have held off on implementing something more "rocket science" state of the art (like BiCGSTAB or CGS) which rely heavily on less familiar bits of linear algebra. |
Some issues that have come up:
|
I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.
I left a long note on the algorithm as a comment on #99992. Move it to the doc folder.
I left a long note on the algorithm as a comment on dotnet#99992. Move it to the doc folder.
Implements a Gauss-Seidel solver for cases where method have irreducible loops.