-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc(scripts): add lantern evaluation scripts #5257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluating firstContentfulPaint vs. optimisticFCP: 15.6% 32.4% - 62/21/16
Evaluating firstContentfulPaint vs. pessimisticFCP: 15.6% 29.9% - 57/29/13
what do these numbers refer to? Maybe add a header or something to the print out?
headers added 👍 |
@@ -0,0 +1,10 @@ | |||
#!/bin/bash | |||
|
|||
# THIS SCRIPT ASSUMES CWD IS ROOT PROJECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want,
pwd="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
lhroot_path="$pwd/../.."
anything else here folks? |
LH_ROOT_PATH="$DIRNAME/../../.." | ||
cd $LH_ROOT_PATH | ||
|
||
TAR_URL="https://drive.google.com/a/chromium.org/uc?id=1_w2g6fQVLgHI62FApsyUDejZyHNXMLm0&export=download" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz add comment on what this is and if it's ever updated frozen snapshots from XXX date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this too.
I mentioned this (jokingly) in another PR, but residual distribution is really one of the key signals we should be looking at for evaluating the regression and (in turn) the generation of the pessimistic/optimistic signals and if we're not incorporating influential variables that we should be.
We would need a random ("random") sample for that, though, and the scripts themselves are good regardless of data run through them, so I'm 👍 👍
const totalBad = []; | ||
|
||
/** | ||
* @param {string} metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyof whatever
might let you skip some of the ts-ignores below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't remove ts-ignores but helps elsewhere 👍
* @property {string} url | ||
* @property {string} tracePath | ||
* @property {string} devtoolsLogPath | ||
* @property {*} lantern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the def for this? is it just that it's long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
phase 2 of run lantern accuracy checks per commit
related #5237
this adds a set of scripts that downloads a set of 100 minified traces, runs lantern over them and prints out some summary statistics.
the trace set is mostly representative, skewing slightly negative so there's more identifiable room for improvement, also the MAPE/spearman's rho stats on this set are slightly lower than you might have seen in previous PRs because of this and that this is 1 trace compared to median of WPT instead of median of 9 traces compared to WPT
Good/OK/Bad thresholds very open to discussion, right now...
<20% absolute error OR <10% percentile difference (predicting 60s when real is 80s is good IMO).
Sites here are roughly indistinguishable from WPT results on single-runs as the error is within the normal variance level of WPT
<50% absolute error, sites here are roughly as inaccurate as DevTools throttling on its edge cases
>50%
absolute error, sites here are usually inaccurate and should be dug into, mostly fall into a few categoriessee sample output