Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit: add charset declaration audit #10284

Merged
merged 23 commits into from
Feb 19, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions lighthouse-core/audits/dobetterweb/charset.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
/**
* @license Copyright 2016 Google Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. its a lighthouse reviewer's favorite thing to call out. 🤕

* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/

/**
* @fileoverview Audits a page to ensure charset it configured properly.
* It must be defined within the first 1024 bytes of the HTML document, defined in the HTTP header, or in a BOM.
Beytoven marked this conversation as resolved.
Show resolved Hide resolved
*/
Beytoven marked this conversation as resolved.
Show resolved Hide resolved
'use strict';

const Audit = require('../audit.js');
const i18n = require('../../lib/i18n/i18n.js');
const MainResource = require('../../computed/main-resource.js');
const CONTENT_TYPE_HEADER = 'content-type';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yesterday we moved these down to above the class dfn. so i think you have a few more commits to push

const CHARSET_META_REGEX = /<meta.*charset="?.{1,}"?.*>/gm;
const CHARSET_HTTP_REGEX = /charset=.{1,}/gm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mentioned you could add these consts to the module.exports, and that way you can write some unit tests against them.

iirc, you wrote these regexs to handle some extra fancy cases that the current unit tests dont cover. like charset= (empty string value). so i think it'd be worth having a unit test for each regex just to give it a handful of cases it should match and not match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that way you can write some unit tests against them.

let me know if you have any questions on this. i'm thinking 1 test with like 10-ish assertions using various html variants.
regexes are hilarious so its good to test out all sorts of edge cases.


const UIStrings = {
/** Title of a Lighthouse audit that provides detail on if the charset is set properly for a page. This title is shown when the charset is defined correctly. */
Beytoven marked this conversation as resolved.
Show resolved Hide resolved
title: 'Properly defines charset',
/** Title of a Lighthouse audit that provides detail on if the charset is set properly for a page. This title is shown when the charset meta tag is missing or defined too late in the page. */
failureTitle: 'Charset element is missing or occurs too late on the page',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failureTitle: 'Charset element is missing or occurs too late on the page',
failureTitle: 'Charset declaration is missing or occurs too late in the HTML',

/** Description of a Lighthouse audit that tells the user why the charset needs to be defined early on. */
description: 'A character encoding declaration is required whether it is done explicitly ' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight rewording. let's remove mention of the BOM here, as I dont think we actually want to recommend it. the linked resource takes care of mentioning it anyway.

A character encoding declaration is required. It can be done with a <meta> tag in the first 1024 bytes of the HTML or in the Content-Type HTTP response header.

'in the first 1024 bytes of the page source, through a Byte Order Mark (BOM), ' +
'or in the content-type HTTP header. ' +
'[Learn more](https://www.w3.org/International/questions/qa-html-encoding-declarations).',
};

const str_ = i18n.createMessageInstanceIdFn(__filename, UIStrings);

class CharsetDefined extends Audit {
/**
* @return {LH.Audit.Meta}
*/
static get meta() {
return {
id: 'charset',
title: str_(UIStrings.title),
failureTitle: str_(UIStrings.failureTitle),
description: str_(UIStrings.description),
requiredArtifacts: ['MainDocumentContent', 'URL', 'devtoolsLogs'],
};
}

/**
* @param {LH.Artifacts} artifacts
* @param {LH.Audit.Context} context
* @return {Promise<LH.Audit.Product>}
*/
static audit(artifacts, context) {
const devtoolsLog = artifacts.devtoolsLogs[Audit.DEFAULT_PASS];
return MainResource.request({devtoolsLog, URL: artifacts.URL}, context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these days we'd write this with async await instead. a little nicer since you can drop the indentation below. i'd recommend it

.then(mainResource => {
let charsetIsSet = false;
// Check the http header 'content-type' to see if charset is defined there
if (mainResource.responseHeaders) {
const contentTypeHeader = mainResource.responseHeaders
.find(header => header.name.toLowerCase() === CONTENT_TYPE_HEADER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing too fancy about this const so i'd inline it here.


if (contentTypeHeader) {
charsetIsSet = contentTypeHeader.value.match(CHARSET_HTTP_REGEX) !== null;
}
}

// Check if there is a BOM byte marker
const BOM_FIRSTCHAR = 65279;
charsetIsSet = charsetIsSet || artifacts.MainDocumentContent.charCodeAt(0) === BOM_FIRSTCHAR;

// Check if charset is defined within the first 1024 characters(~1024 bytes) of the HTML document
charsetIsSet = charsetIsSet ||
artifacts.MainDocumentContent.slice(0, 1024).match(CHARSET_META_REGEX) !== null;

return {
score: Number(charsetIsSet),
};
});
}
}

module.exports = CharsetDefined;
module.exports.UIStrings = UIStrings;
2 changes: 2 additions & 0 deletions lighthouse-core/config/default-config.js
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ const defaultConfig = {
'byte-efficiency/efficient-animated-content',
'dobetterweb/appcache-manifest',
'dobetterweb/doctype',
'dobetterweb/charset',
'dobetterweb/dom-size',
'dobetterweb/external-anchors-use-rel-noopener',
'dobetterweb/geolocation-on-start',
Expand Down Expand Up @@ -505,6 +506,7 @@ const defaultConfig = {
{id: 'external-anchors-use-rel-noopener', weight: 1},
{id: 'geolocation-on-start', weight: 1},
{id: 'doctype', weight: 1},
{id: 'charset', weight: 1},
{id: 'no-vulnerable-libraries', weight: 1},
{id: 'js-libraries', weight: 0},
{id: 'notification-on-start', weight: 1},
Expand Down
2 changes: 1 addition & 1 deletion lighthouse-core/lib/i18n/i18n.js
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ function _formatIcuMessage(locale, icuMessageId, uiStringMessage, values = {}) {
}
// At this point, there is no reasonable string to show to the user, so throw.
if (!localeMessage) {
throw new Error(_ICUMsgNotFoundMsg);
throw new Error(_ICUMsgNotFoundMsg + icuMessageId + uiStringMessage);
}

// when using accented english, force the use of a different locale for number formatting
Expand Down
9 changes: 9 additions & 0 deletions lighthouse-core/lib/i18n/locales/en-US.json
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,15 @@
"lighthouse-core/audits/dobetterweb/appcache-manifest.js | title": {
"message": "Avoids Application Cache"
},
"lighthouse-core/audits/dobetterweb/charset.js | description": {
"message": "A character encoding declaration is required whether it is done explicitly in the first 1024 bytes of the page source, through a Byte Order Mark (BOM), or in the content-type HTTP header. [Learn more](https://www.w3.org/International/questions/qa-html-encoding-declarations)."
},
"lighthouse-core/audits/dobetterweb/charset.js | failureTitle": {
"message": "Charset element is missing or occurs too late on the page"
},
"lighthouse-core/audits/dobetterweb/charset.js | title": {
"message": "Properly defines charset"
},
"lighthouse-core/audits/dobetterweb/doctype.js | description": {
"message": "Specifying a doctype prevents the browser from switching to quirks-mode. [Learn more](https://web.dev/doctype)."
},
Expand Down
9 changes: 9 additions & 0 deletions lighthouse-core/lib/i18n/locales/en-XL.json
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,15 @@
"lighthouse-core/audits/dobetterweb/appcache-manifest.js | title": {
"message": "Âv́ôíd̂ś Âṕp̂ĺîćât́îón̂ Ćâćĥé"
},
"lighthouse-core/audits/dobetterweb/charset.js | description": {
"message": "Â ćĥár̂áĉt́êŕ êńĉód̂ín̂ǵ d̂éĉĺâŕât́îón̂ íŝ ŕêq́ûír̂éd̂ ẃĥét̂h́êŕ ît́ îś d̂ón̂é êx́p̂ĺîćît́l̂ý îń t̂h́ê f́îŕŝt́ 1024 b̂ýt̂éŝ óf̂ t́ĥé p̂áĝé ŝóûŕĉé, t̂h́r̂óûǵĥ á B̂ýt̂é Ôŕd̂ér̂ Ḿâŕk̂ (B́ÔḾ), ôŕ îń t̂h́ê ćôńt̂én̂t́-t̂ýp̂é ĤT́T̂Ṕ ĥéâd́êŕ. [L̂éâŕn̂ ḿôŕê](https://www.w3.org/International/questions/qa-html-encoding-declarations)."
},
"lighthouse-core/audits/dobetterweb/charset.js | failureTitle": {
"message": "Ĉh́âŕŝét̂ él̂ém̂én̂t́ îś m̂íŝśîńĝ ór̂ óĉćûŕŝ t́ôó l̂át̂é ôń t̂h́ê ṕâǵê"
},
"lighthouse-core/audits/dobetterweb/charset.js | title": {
"message": "P̂ŕôṕêŕl̂ý d̂éf̂ín̂éŝ ćĥár̂śêt́"
},
"lighthouse-core/audits/dobetterweb/doctype.js | description": {
"message": "Ŝṕêćîf́ŷín̂ǵ â d́ôćt̂ýp̂é p̂ŕêv́êńt̂ś t̂h́ê b́r̂óŵśêŕ f̂ŕôḿ ŝẃît́ĉh́îńĝ t́ô q́ûír̂ḱŝ-ḿôd́ê. [Ĺêár̂ń m̂ór̂é](https://web.dev/doctype)."
},
Expand Down
185 changes: 185 additions & 0 deletions lighthouse-core/test/audits/dobetterweb/charset-test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
/**
* @license Copyright 2016 Google Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @license Copyright 2016 Google Inc. All Rights Reserved.
* @license Copyright 2020 Google Inc. All Rights Reserved.

* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const CharsetDefinedAudit = require('../../../audits/dobetterweb/charset.js');
const assert = require('assert');
const networkRecordsToDevtoolsLog = require('../../network-records-to-devtools-log.js');

/* eslint-env jest */

describe('Charset defined audit', () => {
it('succeeds when the page contains the charset meta tag', () => {
const finalUrl = 'https://example.com/';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try to strike a weird balance between DRY and WET in our tests.

in this case i think this test would benefit from a getArtifacts() helper method that all of these cases can use. that way it'd be a much easier to see at at glance how each of the artifacts differ.

if you search for artifacts( in the lh-core/test/ folder you'll find a few diff tests that use this sort of pattern.

const mainResource = {
url: finalUrl,
responseHeaders: [],
};
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<meta charset="utf-8" />',
};

const context = {computedCache: new Map()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge deal but you could also move this computedCache thing into generateArtifacts and then you'd have something like...

const {artifacts, context} = generateArtifacts(htmlContent);

return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we can also go async/await here in the tests, too..

async () => { at the top and then down here something like

    const auditResult = await CharsetDefinedAudit.audit(artifacts, context);
    assert.equal(auditResult.score, 1);

assert.equal(auditResult.score, 1);
});
});

it('succeeds when the page has the charset defined in the content-type meta tag', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [],
};
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 1);
});
});

it('succeeds when the page has the charset defined in the content-type http header', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html; charset=UTF-8'},
],
};
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<meta http-equiv="Content-type" content="text/html" />',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 1);
});
});

it('succeeds when the page has the charset defined via BOM', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html'},
],
};
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '\ufeff<meta http-equiv="Content-type" content="text/html" />',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 1);
});
});

it('fails when the page does not have charset defined', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html'},
],
};
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<meta http-equiv="Content-type" content="text/html" />',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 0);
});
});

it('fails when the page has charset defined too late in the page', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html'},
],
};
const bigString = new Array(1024).fill(' ').join('');
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<html><head>' + bigString + '<meta charset="utf-8" />hello',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 0);
});
});

it('passes when the page has charset defined almost too late in the page', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html'},
],
};
const bigString = new Array(900).fill(' ').join('');
const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: '<html><head>' + bigString + '<meta charset="utf-8" />hello',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 1);
});
});

it('fails when charset only partially defined in the first 1024 bytes of the page', () => {
const finalUrl = 'https://example.com/';
const mainResource = {
url: finalUrl,
responseHeaders: [
{name: 'content-type', value: 'text/html'},
],
};
const prelude = '<html><head>';
const charsetHTML = '<meta charset="utf-8" />';
// 1024 bytes should be halfway through the meta tag
const bigString = new Array(1024 - prelude.length - charsetHTML.length / 2).fill(' ').join('');

const devtoolsLog = networkRecordsToDevtoolsLog([mainResource]);
const artifacts = {
devtoolsLogs: {[CharsetDefinedAudit.DEFAULT_PASS]: devtoolsLog},
URL: {finalUrl},
MainDocumentContent: prelude + bigString + charsetHTML + 'hello',
};

const context = {computedCache: new Map()};
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => {
assert.equal(auditResult.score, 0);
});
});
});
25 changes: 24 additions & 1 deletion lighthouse-core/test/results/sample_v2.json
Original file line number Diff line number Diff line change
Expand Up @@ -2782,6 +2782,13 @@
"score": 1,
"scoreDisplayMode": "binary"
},
"charset": {
"id": "charset",
"title": "Properly defines charset",
"description": "A character encoding declaration is required whether it is done explicitly in the first 1024 bytes of the page source, through a Byte Order Mark (BOM), or in the content-type HTTP header. [Learn more](https://www.w3.org/International/questions/qa-html-encoding-declarations).",
"score": 1,
"scoreDisplayMode": "binary"
},
"dom-size": {
"id": "dom-size",
"title": "Avoids an excessive DOM size",
Expand Down Expand Up @@ -4042,6 +4049,10 @@
"id": "doctype",
"weight": 1
},
{
"id": "charset",
"weight": 1
},
{
"id": "no-vulnerable-libraries",
"weight": 1
Expand Down Expand Up @@ -4072,7 +4083,7 @@
}
],
"id": "best-practices",
"score": 0.07
"score": 0.13
},
"seo": {
"title": "SEO",
Expand Down Expand Up @@ -5223,6 +5234,12 @@
"duration": 100,
"entryType": "measure"
},
{
"startTime": 0,
"name": "lh:audit:charset",
"duration": 100,
"entryType": "measure"
},
{
"startTime": 0,
"name": "lh:audit:dom-size",
Expand Down Expand Up @@ -6349,6 +6366,12 @@
"lighthouse-core/audits/dobetterweb/doctype.js | description": [
"audits.doctype.description"
],
"lighthouse-core/audits/dobetterweb/charset.js | title": [
"audits.charset.title"
],
"lighthouse-core/audits/dobetterweb/charset.js | description": [
"audits.charset.description"
],
"lighthouse-core/audits/dobetterweb/dom-size.js | title": [
"audits[dom-size].title"
],
Expand Down
Loading