-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global Deadcode Elimination #1503
Conversation
…caml into new_deadcode
When turning the optimization on by default, I see tests failure in the following places:
They seem to all involve printf/format diff --git a/toplevel/test/test_toplevel.reference b/toplevel/test/test_toplevel.reference
index 2ab06fc0ce..0298111ab7 100644
--- a/toplevel/test/test_toplevel.reference
+++ b/toplevel/test/test_toplevel.reference
@@ -3,7 +3,6 @@ external parseInt : float -> int = "parseInt"
let f = 3.14
let () = Printf.printf "parseInt(%f) = %d\n" f (parseInt f);;
Dynlink: looking for symbol parseInt
-parseInt(3.140000) = 3
external parseInt : float -> int = "parseInt"
val f : float = 3.14 |
Hm, I haven't seen this before in earlier versions with it on by default. I took a quick look and nothing stood out to me, I'll try to take a closer look again soon. Edit: In some of the failed tests, it looks like it could be an ordering problem? (If I'm reading the test output correctly) - 190 191 192 193 194 195 196 197 198
+ 190 191 192 193 194 195
+********* Test number 195 failed ***********
+ 196 197 198 Like here, are we seeing the output |
Thanks a lot for such a big contribution. |
Thank you for all the help getting this merged!! |
Congratulations @micahcantor for the merge! This is a great feature. Responding to the reviewers took substantial work, so thank you for spending time on this. |
It seems that doing so can be incorrect because it could keep live some dead code that uses other zero-ed values. I'll this change, |
cc @vouillon @micahcantor, see my comment above |
CHANGES: ## Features/Changes * Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503) * Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496) * Compiler: loop no longer absorb the whole continuation * Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076) * Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516) * Compiler: support for import and export construct in the js parser/printer * Lib: add download attribute to anchor element * Misc: switch CI to OCaml 5.1 * Misc: preliminary support for OCaml 5.2 * Misc: support for OCaml 5.1.1 ## Bug fixes * Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493) * Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492) * Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517 * Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519) * Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494) * Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515) * Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521) * Compiler: fix free variables for classes * Compiler: fix internal invariant (continuation) * Compiler: fix variable renaming for let, const and classes * Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)
CHANGES: ## Features/Changes * Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503) * Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496) * Compiler: loop no longer absorb the whole continuation * Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076) * Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516) * Compiler: support for import and export construct in the js parser/printer * Lib: add download attribute to anchor element * Misc: switch CI to OCaml 5.1 * Misc: preliminary support for OCaml 5.2 * Misc: support for OCaml 5.1.1 ## Bug fixes * Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493) * Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492) * Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517 * Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519) * Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494) * Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515) * Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521) * Compiler: fix free variables for classes * Compiler: fix internal invariant (continuation) * Compiler: fix variable renaming for let, const and classes * Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)
CHANGES: ## Features/Changes * Compiler: global dead code elimination (Micah Cantor, ocsigen/js_of_ocaml#1503) * Compiler: change control-flow compilation strategy (ocsigen/js_of_ocaml#1496) * Compiler: loop no longer absorb the whole continuation * Compiler: Dead code elimination of unused references (ocsigen/js_of_ocaml#2076) * Compiler: reduce memory consumption (ocsigen/js_of_ocaml#1516) * Compiler: support for import and export construct in the js parser/printer * Lib: add download attribute to anchor element * Misc: switch CI to OCaml 5.1 * Misc: preliminary support for OCaml 5.2 * Misc: support for OCaml 5.1.1 ## Bug fixes * Runtime: fix Dom_html.onIE (ocsigen/js_of_ocaml#1493) * Runtime: add conversion functions + strict equality for compatibility with Wasm_of_ocaml (ocsigen/js_of_ocaml#1492) * Runtime: Dynlink should be able to find symbols in jsoo_runtime ocsigen/js_of_ocaml#1517 * Runtime: fix Unix.lstat, Unix.LargeFile.lstat (ocsigen/js_of_ocaml#1519) * Compiler: fix global flow analysis (ocsigen/js_of_ocaml#1494) * Compiler: fix js parser/printer wrt async functions (ocsigen/js_of_ocaml#1515) * Compiler: fix free variables pass wrt parameters' default value (ocsigen/js_of_ocaml#1521) * Compiler: fix free variables for classes * Compiler: fix internal invariant (continuation) * Compiler: fix variable renaming for let, const and classes * Lib: Url.Current.set_fragment need not any urlencode (ocsigen/js_of_ocaml#1497)
This PR adds a new optimization pass to perform a global and block-aware liveness analysis. The primary purpose of this is to be able to remove unused code from functors that are instantiated more than once, a limitation of the current deadcode elimination algorithm. This issue has been known for some time, see #595.
Since this change is somewhat involved, I will split this comment into a few sections to make it easier to review. The commit history is messy, but the PR should be ready to review file-by-file. I've worked closely with @vouillon and @OlivierNicole on these changes, but there are still some remaining questions to address.
New changes
The primary new contribution is found in
deadcode_dgraph.ml
. There are four main steps to the analysis it performs:usages
function). A variablex
is considered used in a variabley
if eitherx
appears in the definition ofy
, orx
is applied as a block or closure argument to parametery
.liveness
function). Here we mark a variablex
asTop
if it's used in an impure expression or instruction (more details are given in the doc comment). Otherwise it is marked as dead. This pass uses information fromGlobal_flow
to determine whether a return value is used at its callsites.solver
andpropagate
functions). For each variablex
in the graph, its liveness is defined by ajoin
of it's current liveness and the contribution of each of it's usagesy
. More detail is given in the comment forcontribution
, but here we can determine ifx
is used only in a single fieldi
of a blocky
, in which casex
depends only on that field. Then it's marked asLive {i}
.undefined
. Any dead variables are replaced by the sentinal. This means the existing deadcode elimination should be able to remove these usages from the code, reducing the size. At this stage, we also truncate blocks that end with one or more sentinal values to the last non-sentinal variable.Outside of
deadcode_dgraph.ml
, there are a few changes in the driver and a few other functions to integrate this pass into the optimization pipeline. The interface to the passDeadcode_dgraph.f
is called in the functionexact_calls
indriver.ml
. This function runs after most other optimizations, which is important since global flow will break if it is ran beforehand with a different number of variables than it expects. Also, it ensures that this pass, which doesn't expose much opportunity for further optimization, is run only once before a final elimination pass.We also added an optimization that deletes sentinal fields in arrays. Since the sentinal variable has the value
undefined
, the following JS transformation is valid:This helps save a few extra bytes in the generated code. There were a few changes I made to the interface in
generate.ml
and
mlvalue.ml
to facilitate this optimization. There are a few other minor changes that expose information from global flow and expose anundefined
primitive that it needs.Results and benchmarking
The overall effect of this process is that deadcode elimination is now sensitive to the liveness of individual fields within a block (rather than a block being entirely live or dead), and it can mark this liveness in the inputs and outputs of block functions. In the IR, functors are represented as functions from one block to another, where the member functions constitute the elements of these blocks. With this new pass, we can mark which member functions are live and remove the rest.
In practice, this means that if you instantiate a functor (like
Set.Make
in the stdlib) and use just a few functions from its interface (likeadd
,find
, etc) then the other 30 functions provided by the functor will be eliminated from the JS. This already occurs if the functor is instantiated only once, since in that case the compiler can specialize the block function to just a block and eliminate unused code inside. However, this could not occur if the functor is instantiated more than once.Here's a minimal example demonstrating the effect. This program instantiates integer and string sets, and uses a few functions from the
Set
interface.If we compile this to JS with and without the new pass, we get the following results:
We see that a large portion of this small program was taken up by the definitions of all
Set
interface functions, which can now be removed. Indeed, we expect to see the most significant changes from this optimization when the input program is small and uses large functor interfaces.It should be noted that the size of the code removed grows in relation to the number of functors used and (inversely) to the number of functions used from those interfaces. In a small program like this one, the 5-10kb removed by this optimization for each functor can be significant, but for larger programs the percent change will be much smaller.
For instance, another benchmark we used is the catala_web_interpreter:
In this case we can remove 5.5kb, or just 0.22% of the program.
We also observe a modest decrease in size can be seen in another benchmark on toplevel code using
lwt
. The source can be found in/toplevel/examples/lwt_toplevel/toplevel.ml
:Here we remove 28kb of code or a little less than 1% of the total size.
One real-world example that we saw encouraging results for was using the library ocamlgraph, which exposes a large functor interface to graph algorithms abstracted over the graph data structure. We compiled the demo found in the library source, and obtained the following:
Here we're able to remove about 10kb or 9.2% of the program, which instantiates several of the functors provided by the library.
Conclusion
Overall, we expect this optimization to be useful for small web programs that want to use a functor interface like
Set
,Map
orocamlgraph
without unnecessarily increasing the code size by 5-20kb. Larger programs may see a significant change if they internally expose many functor interfaces where they don't use many of the provided functions.This change may also cause a small increase in compile times for some programs. I tested this by compiling
ocamlc
and the toplevel example using the hyperfine benchmarking program, and these were the results:So on
ocamlc
the pass adds about 0.5s and on toplevel it adds about .2s on average.Future Work
We found during testing that the optimization can fail to remove code from nested functors, (i.e. functors that take other functors as arguments), such as in the interfaces exposed by
tyxml
. We made progress on implementing a fix for this, but we didn't finish, so decided not to include that here.