-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(network-analyzer): use arithmetic mean for median #15096
Conversation
if (values.length <= 1) { | ||
median = values[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the 1 case is handled fine by the odd case, and length 0 is guaranteed to be undefined. This makes it more explicit. Reaching into values[0]
when it's empty just to get undefined is kinda confusing.
if (values.length <= 1) { | |
median = values[0]; | |
if (values.length === 0) { | |
median = undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this made TS unhappy, so I threw an error for values.length === 0
which should never happen with real data but plenty of unit tests fail with that. So, I reverted.
Co-authored-by: Adam Raine <6752989+adamraine@users.noreply.github.com>
@@ -8766,7 +8766,7 @@ | |||
}, | |||
{ | |||
"origin": "https://mnl4bjjsnz-dsn.algolia.net", | |||
"serverResponseTime": 0 | |||
"serverResponseTime": 263.2025 | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm seems like a lot..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zero estimate came from this record: https://mnl4bjjsnz-dsn.algolia.net/1/indexes/dev_OFFICE_SCENES/query
, ttfb 49
(slightly more than the rtt estimate of 49.56)
The second, higher estimate came from: https://mnl4bjjsnz-dsn.algolia.net/1/indexes/dev_OFFICE_SCENES/query
, ttfb 575
it's reasonable for query time to be so variable for such a website, so I think taking the average-ish value here (via the median changes in this PR) is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good
Median has a few definitions, depending on how you're using it. If you need a value that's in the dataset, for instance, you have to pick one of the two middle values, you can't take their mean. Either middle value is fine to pick in that case, again depending on your goals. I think the fundamental issue is trying to take the median of one or two or three numbers, at which point the median isn't robust in any meaningful way. This feels a bit like it should have been using the arithmetic mean in the first place (there could maybe be issues with like one outlier request, and this could function as a pseudo trimmed mean, I guess). |
AFAIK, median is always defined to use the arithmetic mean of the two middle values if the number of samples is even. We weren't doing that. The implications for lantern accuracy, at least according to our current test database, is minor but positive.
This value is only used to estimate the server response time in
computeRTTAndServerResponseTime
. The previous value of this biased slightly towards a faster server response time (for example- in the common case of having two estimates- if TCP was any faster than SSL then we select the TCP duration and ignore the SSL duration for purposes of response time).