-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] improve mangle
#2219
[WIP] improve mangle
#2219
Conversation
Assign shorter names to symbols with higher frequency of occurrence.
This frequency-based rearrangement is confined at scope-level to preserve inter-scope
|
So even after this PR, |
Nice PR! Across the board mangle size reduction for little to no extra uglify time. I assume the symbol frequencies are computed after compress? |
May be it's due to ES6...
|
It's computed by which we do (again) right before |
I had noticed that in |
Thanks for the information - I shall learn how to use |
There's a big showstopper problem with this PR - although it produces smaller non-gzip output, gzip output is significantly larger: master:
with this PR (#2219):
We can't use this PR in this state. |
Oops - may be those |
So var ic = Math.E, ac = Math.tan, dc = Math.acos, oc = Math.asin, lc = Math.exp, rc = Math.atan, cc = Math.atan2, sc = Math.sin, uc = Math.cos, pc = Math.PI, _c = Math.min, fc = Math.round, hc = Math.pow, gc = Math.log, yc = Math.abs, xc = Math.floor, mc = Math.max, bc = Math.ceil, vc = Math.sqrt; Which for |
It also explains why I would not be in favor of such an optimization in uglify. |
} | ||
}); | ||
a.sort(function(m, n) { | ||
return n.references.length - m.references.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think symbol references
is updated in compress and would present the pre-compress value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
figure_out_scope()
is called again after compress()
and right before mangle_names()
, as I mentioned in the code references in #2219 (comment)
... or have I missed something? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed mentioning that within figure_out_scope()
, init_scope_vars()
is called on each AST_Scope
, which resets all references to previous instances of SymbolDef
:
https://github.com/mishoo/UglifyJS2/blob/94e5e00c0321b92aca1abf170c12a02d6c3275b5/lib/scope.js#L117
https://github.com/mishoo/UglifyJS2/blob/94e5e00c0321b92aca1abf170c12a02d6c3275b5/lib/scope.js#L262-L263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...wait - my mistake - it's figure_out_scope()
that determines all the references in the first place is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's
figure_out_scope()
that determines all the references in the first place is it not?
Yup - reduce_vars
does rebuild some of that within reset_opt_flags()
, but whenever figure_out_scope()
is called everything is reset to a clean slate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ a.sort(function(m, n) {
+ return n.references.length - m.references.length;
I vaguely recall seeing that mangle approach attempted a couple years back and abandoned due to gzip results. I could be mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, more the reason to try and get to the bottom of this - mysteries are there to be solved 😜
So it's that Safari workaround which messes up the character frequency: I'm soooo tempted to gate this behind the |
We can't change the default unfortunately - old safari browsers are still fairly commonly used in the wild. Can we gate it behind an option defaulting to |
What |
If we are going to have that as default behaviour, then the only way that seems useful is for me to spend some effort and make this new Don't worry about my grumpiness earlier - just letting some steam off before getting back to code 😅 |
Exact same size - so my discovery above is unrelated to this file's gzip size increase. |
You know what - let me add |
I hadn't noticed. Correcting my faulty assumptions or understanding may be a bit humbling - but welcome. |
I think sorting by frequency produces less-optimal results over the basic mangle algorithm because LZ77 maintains a sliding window. In the initial window it's a case of first-come/first-serve when assigning bit patterns to strings. It's a difficult problem to predict how gzip will compress something as the complete algorithm has to be run on a given data set to know for sure. Also, functions are moved to the top with |
Using
|
Interesting dilemma. In a perfect world we'd like to optimize for both gzip and non-gzip sizes. But that's not likely. I think it's best to give the users the flexibility to choose their options. |
That's what I'm aiming for, just need to understand all the intricacies first 😓 Would you mind reminding me why |
Almost tempted to have a script that does exhaustive search for the list of options that would bring minimum uglified/gzipped size for a given input. |
They also disable |
At least with |
We could reintroduce the
I think it produces smaller non-gzip output because it generally assigns the most favorable short mangle variables to function parameters and local variables. If functions are hoisted the theory goes that parameters will be more likely to be assigned those favorable slots - especially with small functions with local variables. In non-gzip output, frequency is the most important factor. But in gzip output one must consider the effects of both relative order and frequency due to LZ building a sliding window.
Disabling |
|
Now that I did a refresh course on Lempel-Ziv stuff, I think sorting by frequency is most likely the wrong approach if we care about gzipped output. Nothing in frequency accounts for locality. What might work is if we turn the order of |
|
That's certainly worth a try. Or maybe use a combined weighting of locality and frequency. The locality weighting for a symbol could "decay" the further you are away from that symbol's last use. |
Seems like --- a/lib/compress.js
+++ b/lib/compress.js
@@ -2746,7 +2746,7 @@ merge(Compressor.prototype, {
if (!has_loop_control) return self.body;
}
}
- if (self instanceof AST_While) {
+ if (false && self instanceof AST_While) {
return make_node(AST_For, self, self).optimize(compressor);
}
return self;
|
It looks like just statistical noise either way. |
It's not just locality that's important to LZ - relative order matters too. Notice how |
Tried bottom-up instead of top-down mangling, but doesn't win all the time in |
@alexlamsl That's unfortunate. It's tough to improve on the present |
@kzc what I discovered so far is that disturbing the sequence of mangled names as minimal as possible produces the smallest gzip size, even without frequency assignment. Obviously, however, the nature of cross-scope variable usage and assignment makes this incredibly difficult. |
Assign shorter names to symbols with higher frequency of occurrence.