Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(network-analyzer): use arithmetic mean for median #15096

Merged
merged 8 commits into from
May 20, 2023

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented May 18, 2023

AFAIK, median is always defined to use the arithmetic mean of the two middle values if the number of samples is even. We weren't doing that. The implications for lantern accuracy, at least according to our current test database, is minor but positive.

image

This value is only used to estimate the server response time in computeRTTAndServerResponseTime. The previous value of this biased slightly towards a faster server response time (for example- in the common case of having two estimates- if TCP was any faster than SSL then we select the TCP duration and ignore the SSL duration for purposes of response time).

@connorjclark connorjclark requested a review from a team as a code owner May 18, 2023 19:20
@connorjclark connorjclark requested review from brendankenny and removed request for a team May 18, 2023 19:20
Comment on lines 56 to 57
if (values.length <= 1) {
median = values[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the 1 case is handled fine by the odd case, and length 0 is guaranteed to be undefined. This makes it more explicit. Reaching into values[0] when it's empty just to get undefined is kinda confusing.

Suggested change
if (values.length <= 1) {
median = values[0];
if (values.length === 0) {
median = undefined;

Copy link
Collaborator Author

@connorjclark connorjclark May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this made TS unhappy, so I threw an error for values.length === 0 which should never happen with real data but plenty of unit tests fail with that. So, I reverted.

Co-authored-by: Adam Raine <6752989+adamraine@users.noreply.github.com>
@@ -8766,7 +8766,7 @@
},
{
"origin": "https://mnl4bjjsnz-dsn.algolia.net",
"serverResponseTime": 0
"serverResponseTime": 263.2025
}
Copy link
Collaborator Author

@connorjclark connorjclark May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm seems like a lot..

Copy link
Collaborator Author

@connorjclark connorjclark May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from these estimates:
image

in _estimateResponseTimeByOrigin's Math.max(ttfb - rtt, 0), ttfb - rtt is somehow negative so we get a zero as an estimate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero estimate came from this record: https://mnl4bjjsnz-dsn.algolia.net/1/indexes/dev_OFFICE_SCENES/query, ttfb 49 (slightly more than the rtt estimate of 49.56)

The second, higher estimate came from: https://mnl4bjjsnz-dsn.algolia.net/1/indexes/dev_OFFICE_SCENES/query, ttfb 575

it's reasonable for query time to be so variable for such a website, so I think taking the average-ish value here (via the median changes in this PR) is good.

Copy link
Member

@adamraine adamraine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good

@brendankenny
Copy link
Member

brendankenny commented May 18, 2023

median is always defined to use the arithmetic mean of the two middle values if the number of samples is even

Median has a few definitions, depending on how you're using it. If you need a value that's in the dataset, for instance, you have to pick one of the two middle values, you can't take their mean. Either middle value is fine to pick in that case, again depending on your goals.

I think the fundamental issue is trying to take the median of one or two or three numbers, at which point the median isn't robust in any meaningful way. This feels a bit like it should have been using the arithmetic mean in the first place (there could maybe be issues with like one outlier request, and this could function as a pseudo trimmed mean, I guess).

@connorjclark connorjclark changed the title core(network-analyzer): use arithmetic mean for median of even samples core(network-analyzer): use arithmetic mean for median May 20, 2023
@connorjclark connorjclark merged commit 9922e4c into main May 20, 2023
@connorjclark connorjclark deleted the nit-median-avg branch May 20, 2023 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants