Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit(cache-headers): detects savings from leveraging caching #3531

Merged
merged 14 commits into from
Nov 17, 2017
229 changes: 229 additions & 0 deletions lighthouse-core/audits/byte-efficiency/cache-headers.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
/**
* @license Copyright 2017 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const parseCacheControl = require('parse-cache-control');
const ByteEfficiencyAudit = require('./byte-efficiency-audit');
const WebInspector = require('../../lib/web-inspector');
const URL = require('../../lib/url-shim');

// Ignore assets that have low likelihood for cache miss.
const IGNORE_THRESHOLD_IN_PERCENT = 0.1;
// Discount the wasted bytes by some multiplier to reflect that these savings are only for repeat visits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// As this savings is only for repeat visits, we discount the savings considerably.
// Basically we assume a 10% chance of repeat visit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const WASTED_BYTES_DISCOUNT_MULTIPLIER = 0.1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the screenshot for why this was necessary (even at 1/10th leverage browser caching reports 2s of savings), 10% chance of repeat visit seems reasonable-ish?


const SECONDS_IN_MINUTE = 60;
const SECONDS_IN_HOUR = 60 * SECONDS_IN_MINUTE;
const SECONDS_IN_DAY = 24 * SECONDS_IN_HOUR;

const CACHEABLE_STATUS_CODES = new Set([200, 203, 206]);

const STATIC_RESOURCE_TYPES = new Set([
WebInspector.resourceTypes.Font,
WebInspector.resourceTypes.Image,
WebInspector.resourceTypes.Media,
WebInspector.resourceTypes.Script,
WebInspector.resourceTypes.Stylesheet,
]);

// This array contains the hand wavy distribution of the age of a resource in hours at the time of
// cache hit at 0th, 10th, 20th, 30th, etc percentiles. This is used to compute `wastedMs` since there
// are clearly diminishing returns to cache duration i.e. 6 months is not 2x better than 3 months.
// Based on UMA stats for HttpCache.StaleEntry.Validated.Age, see https://www.desmos.com/calculator/7v0qh1nzvh
// Example: a max-age of 12 hours already covers ~50% of cases, doubling to 24 hours covers ~10% more.
const RESOURCE_AGE_IN_HOURS_DECILES = [0, 0.2, 1, 3, 8, 12, 24, 48, 72, 168, 8760, Infinity];

class CacheHeaders extends ByteEfficiencyAudit {
/**
* @return {number}
*/
static get WASTED_BYTES_DISCOUNT_MULTIPLIER() {
return WASTED_BYTES_DISCOUNT_MULTIPLIER;
}

/**
* @return {!AuditMeta}
*/
static get meta() {
return {
category: 'Caching',
name: 'cache-headers',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshedding on name? It's not just 'cache-headers' but also a judgement of them. asset-cache-length? asset-caching-ttl?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm if we go by consistency with the other byte efficiency audits they basically fall into either <noun of thing being detected> or uses-<best practice we're encouraging>

how about...

uncached-assets
low-cache-ttl
uses-caching
uses-cache-headers
uses-long-cache-ttl
?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses-long-cache-ttl certainly isn't exactly catchy but describes it well :) I like that since it's not just use, it's (if they're used) that they're long

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

informative: true,
helpText:
'A well-defined cache policy can speed up repeat visits to your page. ' +
'[Learn more](https://developers.google.com/speed/docs/insights/LeverageBrowserCaching).',
description: 'Leverage browser caching',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leverage browser caching for static assets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

requiredArtifacts: ['devtoolsLogs'],
};
}

/**
* Converts a time in seconds into a duration string, i.e. `1d 2h 13m 52s`
* @param {number} maxAgeInSeconds
* @return {string}
*/
static toDurationDisplay(maxAgeInSeconds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good if we move this stuff into lighthouse-core/report/v2/renderer/util.js yah?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, done

if (maxAgeInSeconds === 0) {
return 'None';
}

const parts = [];
const unitLabels = [
['d', SECONDS_IN_DAY],
['h', SECONDS_IN_HOUR],
['m', SECONDS_IN_MINUTE],
['s', 1],
];

for (const [label, unit] of unitLabels) {
const numberOfUnits = Math.floor(maxAgeInSeconds / unit);
if (numberOfUnits > 0) {
maxAgeInSeconds -= numberOfUnits * unit;
parts.push(`${numberOfUnits}\xa0${label}`);
}
}

return parts.join(' ');
}

/**
* Computes the percent likelihood that a return visit will be within the cache lifetime, based on
* Chrome UMA stats see the note above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know its been our policy to have consts at the top but in this case i think it hurts readability.. would prefer to have those relevant const just defined here inside of getCacheHitLikelihood.

doing that would avoid pingponging back and forth between the top of the file and down here when reading it.

wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure sg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess you can drop "see the note above." now

* @param {number} maxAgeInSeconds
* @return {number}
*/
static getCacheHitLikelihood(maxAgeInSeconds) {
const maxAgeInHours = maxAgeInSeconds / 3600;
const upperDecileIndex = RESOURCE_AGE_IN_HOURS_DECILES.findIndex(
decile => decile >= maxAgeInHours
);
if (upperDecileIndex === 11) return 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 => RESOURCE_AGE_IN_HOURS_DECILES.length - 1 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (upperDecileIndex === 0) return 0;

const upperDecile = RESOURCE_AGE_IN_HOURS_DECILES[upperDecileIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe switch these names? upperDecile/lowerDecile would refer to 0, 0.1, 0.2, etc while upperDecileValue or whatever would be the entry in the array

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure done

const lowerDecile = RESOURCE_AGE_IN_HOURS_DECILES[upperDecileIndex - 1];

const tenthsPlace = upperDecileIndex;
// approximate the position between deciles as linear
const hundredthsPlace = 10 * (maxAgeInHours - lowerDecile) / (upperDecile - lowerDecile);
return tenthsPlace / 10 + hundredthsPlace / 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am just barely following this construction. surely there's another, simpler way to get the number you're returning.. :) and even with the complexity reduced it will still benefit from plentiful comments.

at the very least i would recommend avoiding a hundredthsPlace-like variable.

a lerp() function would probably make sense. (h/t brendan)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seemed far more grok'able to me than linearInterpolation function 😆 alright will do

}

/**
* Computes the user-specified cache lifetime, 0 if explicit no-cache policy is in effect, and null if not
* user-specified. See https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you nuke the period at the end here. it breaks autolinkers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* @param {!Map<string,string>} headers
* @param {!Object} cacheControl Follows the potential settings of cache-control, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
* @return {?number}
*/
static computeCacheLifetimeInSeconds(headers, cacheControl) {
// Pragma controls caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy Pragma header controls caching...

also feels kinda weird to handle this edgecase case first. wanna move it after the if (cacheControl) { block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. seemed simpler to me to get the crazy old case out of the way first so that the rest of the function you know you're dealing with cache-control/expires precedence but I'm not the one having to read it from the outside :)

if (!cacheControl && (headers.get('pragma') || '').includes('no-cache')) {
return 0;
}

// Cache-Control takes precendence over expires
if (cacheControl) {
if (cacheControl['no-cache'] || cacheControl['no-store']) return 0;
if (Number.isFinite(cacheControl['max-age'])) return Math.max(cacheControl['max-age'], 0);
}

if (headers.has('expires')) {
const expires = new Date(headers.get('expires')).getTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay for standards that enable this parser to handle the http date format. \o/

// Invalid expires values MUST be treated as already expired
if (!expires) return 0;
return Math.max(0, Math.ceil((expires - Date.now()) / 1000));
}

return null;
}

/**
* Given a network record, returns whether we believe the asset is cacheable, i.e. it was a network
* request that satisifed the conditions:
*
* 1. Has a cacheable status code
* 2. Has a resource type that corresponds to static assets (image, script, stylesheet, etc).
* 3. It does not have a query string.
*
* @param {!WebInspector.NetworkRequest} record
* @return {boolean}
*/
static isCacheableAsset(record) {
const resourceUrl = record._url;
return (
CACHEABLE_STATUS_CODES.has(record.statusCode) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same deal with CACHEABLE_STATUS_CODES and position.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

STATIC_RESOURCE_TYPES.has(record._resourceType) &&
!resourceUrl.includes('?') &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why excluding all of these? there was voodoo around these not being cached at the proxy level but i dont know of a reason a browser has different policy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PSI does it based on the claim that resources with query strings are not heuristically cacheable, seems like a reasonable assumption if the asset has a query string and no explicit cache policy then don't cache it, people do weird changes on gets with query string all the time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Innnnteresting. I went and found the history of that line. (took a while).

It goes back to here: pagespeed/page-speed@4c4f031#diff-2478b085708a8d438d5057d0365f067fR384

It originally landed with " This is a debatable policy. " :)

I have a feeling like folk's use of query strings are different in 2017 than 2010, but who knows.

Can we leave some comments here that provide some context?
And perhaps a TODO to explore including these records and see what it tells us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

!resourceUrl.includes('data:')
);
}

/**
* @param {!Artifacts} artifacts
* @return {!AuditResult}
*/
static audit_(artifacts) {
const devtoolsLogs = artifacts.devtoolsLogs[ByteEfficiencyAudit.DEFAULT_PASS];
return artifacts.requestNetworkRecords(devtoolsLogs).then(records => {
const results = [];
for (const record of records) {
if (!CacheHeaders.isCacheableAsset(record)) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are redirects just filtered out in that fn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, only 200, 203, 206 allowed


const headers = new Map();
for (const header of record._responseHeaders) {
headers.set(header.name, header.value);
}

// Ignore assets that have an etag since they will not be re-downloaded as long as they are valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm no etag expert but i am not consistently seeing this described behavior

for example: this (tooootally random) page... https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag the html resource has an etag but refreshing it (various ways) does refetch it and get a 200.

then again on http://music.com/ the logo.png has an etag and no other caching headers and it gets served from memory cache on reloads.

so perhaps the document is handled differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "as they are valid" mean? you mean "match the server's etag"?

this comment could probably afford to break into two lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having an etag and the server actually acting on it and doing the right thing are different things :)

when the if-none-match: <etag> header is sent the server should reply with a 304 if it matches the resource it's planning on sending, perhaps file an issue for a separate audit that checks for improper server handling of etag'd assets?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified the comment a bit

if (headers.has('etag')) continue;

const cacheControl = parseCacheControl(headers.get('cache-control'));
let cacheLifetimeInSeconds = CacheHeaders.computeCacheLifetimeInSeconds(
headers,
cacheControl
);

// Ignore assets with an explicit no-cache policy
if (cacheLifetimeInSeconds === 0) continue;
cacheLifetimeInSeconds = cacheLifetimeInSeconds || 0;

let cacheMissLikelihood = 1 - CacheHeaders.getCacheHitLikelihood(cacheLifetimeInSeconds);
if (cacheMissLikelihood < IGNORE_THRESHOLD_IN_PERCENT) continue;

const totalBytes = record._transferSize;
const wastedBytes = cacheMissLikelihood * totalBytes * WASTED_BYTES_DISCOUNT_MULTIPLIER;
const cacheLifetimeDisplay = CacheHeaders.toDurationDisplay(cacheLifetimeInSeconds);
cacheMissLikelihood = `${Math.round(cacheMissLikelihood * 100)}%`;

results.push({
url: URL.elideDataURI(record._url),
cacheControl,
cacheLifetimeInSeconds,
cacheLifetimeDisplay,
cacheMissLikelihood,
totalBytes,
wastedBytes,
});
}

const headings = [
{key: 'url', itemType: 'url', text: 'URL'},
{key: 'cacheLifetimeDisplay', itemType: 'text', text: 'Cache Lifetime'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache TTL

is one option but perhaps too jargony

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context should be enough if they don't know 👍

{key: 'cacheMissLikelihood', itemType: 'text', text: 'Cache Miss Likelihood (%)'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est. Likelihood of Cache Miss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this column is admittedly a little odd (showing the global data but next to each individual record). I could see us skip printing this column, and just explaining it in the docs instead. wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is basically global, but this is the only place where we actually show how severe the existing cache policy is, so I'm in favor of surfacing it

I feel like no one will look at the docs to find that information

{key: 'totalKb', itemType: 'text', text: 'Size (KB)'},
];

return {
results,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's sort these.
i guess by like totalKb * cacheMissLikelihood ?

Copy link
Collaborator Author

@patrickhulce patrickhulce Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should already be sorted by that thanks to

const results = result.results
.map(item => {
const wastedPercent = 100 * item.wastedBytes / item.totalBytes;
item.wastedKb = this.bytesToKbString(item.wastedBytes);
item.wastedMs = this.bytesToMsString(item.wastedBytes, networkThroughput);
item.totalKb = this.bytesToKbString(item.totalBytes);
item.totalMs = this.bytesToMsString(item.totalBytes, networkThroughput);
item.potentialSavings = this.toSavingsString(item.wastedBytes, wastedPercent);
return item;
})
.sort((itemA, itemB) => itemB.wastedBytes - itemA.wastedBytes);

👍

headings,
};
});
}
}

module.exports = CacheHeaders;
2 changes: 2 additions & 0 deletions lighthouse-core/config/default.js
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ module.exports = {
'accessibility/valid-lang',
'accessibility/video-caption',
'accessibility/video-description',
'byte-efficiency/cache-headers',
'byte-efficiency/total-byte-weight',
'byte-efficiency/offscreen-images',
'byte-efficiency/uses-webp-images',
Expand Down Expand Up @@ -229,6 +230,7 @@ module.exports = {
{id: 'consistently-interactive', weight: 5, group: 'perf-metric'},
{id: 'speed-index-metric', weight: 1, group: 'perf-metric'},
{id: 'estimated-input-latency', weight: 1, group: 'perf-metric'},
{id: 'cache-headers', weight: 0, group: 'perf-hint'},
{id: 'link-blocking-first-paint', weight: 0, group: 'perf-hint'},
{id: 'script-blocking-first-paint', weight: 0, group: 'perf-hint'},
{id: 'uses-responsive-images', weight: 0, group: 'perf-hint'},
Expand Down
Loading