From 8b66a99be227c1606665f3716006c5033d5a778e Mon Sep 17 00:00:00 2001 From: Maurelian Date: Fri, 7 May 2021 22:59:54 -0400 Subject: [PATCH] Some improvements to optimizer documentation Co-authored-by: Harikrishnan Mulackal --- docs/internals/optimizer.rst | 105 ++++++++++++++++++++++++++--------- 1 file changed, 78 insertions(+), 27 deletions(-) diff --git a/docs/internals/optimizer.rst b/docs/internals/optimizer.rst index d56c8adf9680..2aed29851309 100644 --- a/docs/internals/optimizer.rst +++ b/docs/internals/optimizer.rst @@ -6,24 +6,24 @@ The Optimizer ************* The Solidity compiler uses two different optimizer modules: The "old" optimizer -that operates at opcode level and the "new" optimizer that operates on Yul IR code. +that operates at the opcode level and the "new" optimizer that operates on Yul IR code. The opcode-based optimizer applies a set of `simplification rules `_ to opcodes. It also combines equal code sets and removes unused code. The Yul-based optimizer is much more powerful, because it can work across function -calls: In Yul, it is not possible to perform arbitrary jumps, so it is for example +calls. For example, arbitrary jumps are not possible in Yul, so it is possible to compute the side-effects of each function. Consider two function calls, -where the first does not modify the storage and the second modifies the storage. -If their arguments and return values does not depend on each other, we can reorder +where the first does not modify storage and the second does modify storage. +If their arguments and return values do not depend on each other, we can reorder the function calls. Similarly, if a function is side-effect free and its result is multiplied by zero, you can remove the function call completely. Currently, the parameter ``--optimize`` activates the opcode-based optimizer for the generated bytecode and the Yul optimizer for the Yul code generated internally, for example for ABI coder v2. -One can use ``solc --ir-optimized --optimize`` to produce -optimized experimental Yul IR for a Solidity source. Similarly, use ``solc --strict-assembly --optimize`` +One can use ``solc --ir-optimized --optimize`` to produce an +optimized experimental Yul IR for a Solidity source. Similarly, one can use ``solc --strict-assembly --optimize`` for a stand-alone Yul mode. You can find more details on both optimizer modules and their optimization steps below. @@ -32,7 +32,7 @@ Benefits of Optimizing Solidity Code ==================================== Overall, the optimizer tries to simplify complicated expressions, which reduces both code -size and execution cost, i.e., it can reduce gas needed for contract deployment as well as for external calls to the contract. +size and execution cost, i.e., it can reduce gas needed for contract deployment as well as for external calls made to the contract. It also specializes or inlines functions. Especially function inlining is an operation that can cause much bigger code, but it is often done because it results in opportunities for more simplifications. @@ -41,11 +41,11 @@ often done because it results in opportunities for more simplifications. Differences between Optimized and Non-Optimized Code ==================================================== -Generally, the most visible difference would be constant expressions getting evaluated. -When it comes to the ASM output, one can also notice reduction of equivalent/duplicate -"code blocks" (compare the output of the flags ``--asm`` and ``--asm --optimize``). However, +Generally, the most visible difference is that constant expressions are evaluated at compile time. +When it comes to the ASM output, one can also notice a reduction of equivalent or duplicate +code blocks (compare the output of the flags ``--asm`` and ``--asm --optimize``). However, when it comes to the Yul/intermediate-representation, there can be significant -differences, for example, functions can get inlined, combined, rewritten to eliminate +differences, for example, functions may be inlined, combined, or rewritten to eliminate redundancies, etc. (compare the output between the flags ``--ir`` and ``--optimize --ir-optimized``). @@ -55,7 +55,9 @@ Optimizer Parameter Runs The number of runs (``--optimize-runs``) specifies roughly how often each opcode of the deployed code will be executed across the life-time of the contract. This means it is a trade-off parameter between code size (deploy cost) and code execution cost (cost after deployment). -A "runs" parameter of "1" will produce short but expensive code. The largest value is ``2**32-1``. +A "runs" parameter of "1" will produce short but expensive code. In contrast, a larger "runs" +parameter will produce longer but more gas efficient code. The maximum value of the parameter +is ``2**32-1``. .. note:: @@ -65,31 +67,81 @@ A "runs" parameter of "1" will produce short but expensive code. The largest val Opcode-Based Optimizer Module ============================= -The opcode-based optimizer module operates on assembly. It splits the +The opcode-based optimizer module operates on assembly code. It splits the sequence of instructions into basic blocks at ``JUMPs`` and ``JUMPDESTs``. Inside these blocks, the optimizer analyzes the instructions and records every modification to the stack, memory, or storage as an expression which consists of an instruction and -a list of arguments which are pointers to other expressions. The opcode-based optimizer -uses a component called "CommonSubexpressionEliminator" that amongst other +a list of arguments which are pointers to other expressions. + +Additionally, the opcode-based optimizer +uses a component called "CommonSubexpressionEliminator" that, amongst other tasks, finds expressions that are always equal (on every input) and combines them into an expression class. It first tries to find each new expression in a list of already known expressions. If no such matches are found, it simplifies the expression according to rules like ``constant + constant = sum_of_constants`` or ``X * 1 = X``. Since this is a recursive process, we can also apply the latter rule if the second factor -is a more complex expression where we know that it always evaluates to one. -Modifications to storage and memory locations have to erase knowledge about -storage and memory locations which are not known to be different. If we first -write to location x and then to location y and both are input variables, the -second could overwrite the first, so we do not know what is stored at x after -we wrote to y. If simplification of the expression ``x - y`` evaluates to a -non-zero constant, we know that we can keep our knowledge about what is stored at ``x``. +is a more complex expression which we know always evaluates to one. + +Certain optimizer steps symbolically track the storage and memory locations. For example, this +information is used to compute Keccak-256 hashes that can be evaluated during compile time. Consider +the sequence: + +:: + + PUSH 32 + PUSH 0 + CALLDATALOAD + PUSH 100 + DUP2 + MSTORE + KECCAK256 + +or the equivalent Yul + +:: + + let x := calldataload(0) + mstore(x, 100) + let value := keccak256(x, 32) + +In this case, the optimizer tracks the value at a memory location ``calldataload(0)`` and then +realizes that the Keccak-256 hash can be evaluated at compile time. This only works if there is no +other instruction that modifies memory between the ``mstore`` and ``keccak256``. So if there is an +instruction that writes to memory (or storage), then we need to erase the knowledge of the current +memory (or storage). There is, however, an exception to this erasing, when we can easily see that +the instruction doesn't write to a certain location. + +For example, + +:: + + let x := calldataload(0) + mstore(x, 100) + // Current knowledge memory location x -> 100 + let y := add(x, 32) + // Does not clear the knowledge that x -> 100, since y does not write to [x, x + 32) + mstore(y, 200) + // This Keccak-256 can now be evaluated + let value := keccak256(x, 32) + +Therefore, modifications to storage and memory locations, of say location ``l``, must erase +knowledge about storage or memory locations which may be equal to ``l``. More specifically, for +storage, the optimizer has to erase all knowledge of symbolic locations, that may be equal to ``l`` +and for memory, the optimizer has to erase all knowledge of symbolic locations that may not be at +least 32 bytes away. If ``m`` denotes an arbitrary location, then this decision on erasure is done +by computing the value ``sub(l, m)``. For storage, if this value evaluates to a literal that is +non-zero, then the knowledge about ``m`` will be kept. For memory, if the value evaluates to a +literal that is between ``32`` and ``2**256 - 32``, then the knowledge about ``m`` will be kept. In +all other cases, the knowledge about ``m`` will be erased. After this process, we know which expressions have to be on the stack at the end, and have a list of modifications to memory and storage. This information is stored together with the basic blocks and is used to link them. Furthermore, knowledge about the stack, storage and memory configuration is forwarded to -the next block(s). If we know the targets of all ``JUMP`` and ``JUMPI`` instructions, +the next block(s). + +If we know the targets of all ``JUMP`` and ``JUMPI`` instructions, we can build a complete control flow graph of the program. If there is only one target we do not know (this can happen as in principle, jump targets can be computed from inputs), we have to erase all knowledge about the input state @@ -108,19 +160,18 @@ stack in the correct place. These steps are applied to each basic block and the newly generated code is used as replacement if it is smaller. If a basic block is split at a ``JUMPI`` and during the analysis, the condition evaluates to a constant, -the ``JUMPI`` is replaced depending on the value of the constant. Thus code like +the ``JUMPI`` is replaced based on the value of the constant. Thus code like :: uint x = 7; data[7] = 9; - if (data[x] != x + 2) + if (data[x] != x + 2) // this condition is never true return 2; else return 1; -still simplifies to code which you can compile even though the instructions contained -a jump in the beginning of the process: +simplifies to this: ::