Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit(hreflang): document has a valid hreflang code #3815

Merged
merged 7 commits into from
Nov 29, 2017

Conversation

kdzwinel
Copy link
Collaborator

Closes #3177

Failing audit:
screen shot 2017-11-13 at 00 01 49

Successful audit:
screen shot 2017-11-13 at 00 03 39

initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting quite nasty. Maybe we should move headers to an external file (e.g. seo-failure-cases.html?headers=seo-failure-headers.json) or keep them here, but in a separate field (headers: ['Link: bla', 'x-robots-tag:none'])? In the second case we could pass them to the static-server e.g. in a header (which would be a very meta thing to do). @brendankenny WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url is compared to the LHR's url property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url expectations string too.

@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)

This is a js file, so there's a few ways we could make this better:

  • save parts of the URL to variables at the top of the file so they are just concatted here
  • create some kind of headersParam([]) function so that this could be url: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none']) or whatever
  • use the URL constructor to create the string and rely on implicit toString or save href to a variable up top and invoke down here
  • some combination of the above

'use strict';

const Audit = require('../audit');
const LinkHeader = require('http-link-header');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new dependency - https://github.com/jhermsmeier/node-http-link-header. It's a simple (~300LOC), well tested, link header value parser with no dependencies.

.then(nodes => nodes &&
Promise.all(
nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')]))
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver.querySelector solution works well for simple getters but with driver.querySelectorAll and multiple getAttribute calls it's getting a bit less readable IMO. Maybe a simple injected script (driver.evaluateAsync) would be a better fit here? Something like:

const links = document.querySelectorAll('head link[rel="alternate" i][hreflang]');
return Array.from(links).map(({href, hreflang}) => ({href, hreflang}));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions here, so I'll defer to others

evaluate async seems fine to me, but this also doesn't really seem that bad either


const Audit = require('../audit');
const LinkHeader = require('http-link-header');
const axeCore = require('axe-core');
Copy link
Collaborator Author

@kdzwinel kdzwinel Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out I can't import 'axe-core' as it wasn't imported anywhere else before and since it's huge, it blows our bundle budget 🤔

Since I'm using only a tiny part of 'axe-core' it seems like tree shaking should help here, but unfortunately we are not using import/export syntax ATM(which is required for tree shaking to work).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can just import the languages file?

const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta observation: there's only ~17k possible 3-character codes and ~8k of them are valid

how much value is the list really providing over a /^[a-z]{2,3}$/ regex? are mistakes likely to result in a different invalid 3-character code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize require can import single files that are not in the root folder of the package, thanks! I had to use a slight hack (global variable) to get things working though.

45% of 3-character codes and 25% of 2-character codes are valid - I agree that it's a lot, but IMO it still leaves a huge margin for people to make "catchable" mistakes. We may also extend this audit with region and script codes validation in the future. @rviscomi WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of regex pattern matching as a substitute for whitelist checking. Too many false positives.

Copy link
Member

@brendankenny brendankenny Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is putting it in a different data structure when browserifying. A quick test of a pretty naive trie got it down to about 1/2 the size raw, 1/5 the size after gzip (19915 -> 3751 bytes). Being more clever (char codes, exclusions vs inclusions) can get the trie down to about 1/4 the size of the current file, but the gzip size doesn't really budge after that first drop.

Probably not worth pursuing at this point, but maybe worth leaving a note for when we go back looking for the low hanging fruit of shrinking bundle size.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! seems pretty straightforward


const Audit = require('../audit');
const LinkHeader = require('http-link-header');
const axeCore = require('axe-core');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can just import the languages file?

const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')

function headerHasValidHreflangs(headerValue) {
const linkHeader = LinkHeader.parse(headerValue);

return linkHeader.get('rel', 'alternate')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always returns an array? strange api 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single Link header can have multiple rel=alternate links (e.g. <http://es.example.com/>; rel="alternate"; hreflang="es",<http://fr.example.com/>; rel="alternate"; Hreflang="fr-be"), so IMO it makes sense to always return an array.

static get meta() {
return {
name: 'hreflang',
description: 'Document has a valid `hreflang`',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a property of an alternate link rather than the entire document, or am I misunderstanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! @rviscomi WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I wrote the text, but I think I was trying to strike a balance between the hreflang being used as both an HTTP header value and HTML attribute value. Other audits also use "document" as the owner of the thing being audited, for example:

  • Document has a <meta name="viewport"> tag with width or initial-scale.
  • Document uses legible font sizes.
  • Document has a <title> element.
  • Document has a meta description.
  • Document has a valid rel=canonical.
  • Document avoids plugins.

So I'm ok with the text as it is currently written but open to suggestions.

afterPass(options) {
const driver = options.driver;

return driver.querySelectorAll('head link[rel="alternate" i][hreflang]')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

querySelectorAll returns null instead of []!? we should fix that 😆

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, it always returns an array. I've updated the code (and jsdoc) accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, no worries!

.then(nodes => nodes &&
Promise.all(
nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')]))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions here, so I'll defer to others

evaluate async seems fine to me, but this also doesn't really seem that bad either

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

* Import list of valid languages from axe core without including whole axe-core package
*/
function importValidLangs() {
const axeCache = global.axe;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

'XX-be',
'XX-be-Hans',
'',
' es',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this invalid because of the whitespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, AFAIK it should be invalid in this case

@patrickhulce
Copy link
Collaborator

assigning to you @brendankenny since konrad's got some good smokehouse points for ya :)

@@ -12,6 +12,8 @@
<meta name="viewport" content="invalid-content=should_have_looked_it_up">
<!-- no <meta name="description" content=""> -->
<meta name="robots" content="nofollow, NOINDEX, all">
<link rel="alternate" hreflang="xx" href="https://xx.example.com" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind starting to mark these with PASS(audit-name): details if possible and succinct (or FAIL) like e.g. in

<!-- FAIL(optimized): image is not JPEG optimized -->
<!-- FAIL(webp): image is not WebP optimized -->
<!-- PASS(responsive): image is used at full size -->
<!-- FAIL(offscreen): image is offscreen -->
<img style="position: absolute; top: -10000px;" src="lighthouse-unoptimized.jpg">
when possible?

helpful for going back later and knowing what touches what test/audit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Added two comments here and one in the seo-tester.html

initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url is compared to the LHR's url property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url expectations string too.

@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)

This is a js file, so there's a few ways we could make this better:

  • save parts of the URL to variables at the top of the file so they are just concatted here
  • create some kind of headersParam([]) function so that this could be url: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none']) or whatever
  • use the URL constructor to create the string and rely on implicit toString or save href to a variable up top and invoke down here
  • some combination of the above

}

/**
* @param {String} hreflang
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase string/boolean for all of these (this will soon actually be checked! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Can't wait for your PR to land 👌


return artifacts.requestMainResource(devtoolsLogs)
.then(mainResource => {
/** @type {Array<{source: String|Object}>} */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array<{source: {type: string, snippet: string}}>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, although it's actually Array<{source: string|{type: string, snippet: string}}> because array item is an object for failing nodes, and a string for failing headers.

Copy link
Collaborator Author

@kdzwinel kdzwinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendankenny Thank you for a review! I addressed your comments. PTAL.

@@ -12,6 +12,8 @@
<meta name="viewport" content="invalid-content=should_have_looked_it_up">
<!-- no <meta name="description" content=""> -->
<meta name="robots" content="nofollow, NOINDEX, all">
<link rel="alternate" hreflang="xx" href="https://xx.example.com" />
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Added two comments here and one in the seo-tester.html

}

/**
* @param {String} hreflang
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Can't wait for your PR to land 👌


return artifacts.requestMainResource(devtoolsLogs)
.then(mainResource => {
/** @type {Array<{source: String|Object}>} */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, although it's actually Array<{source: string|{type: string, snippet: string}}> because array item is an object for failing nodes, and a string for failing headers.

'XX-be',
'XX-be-Hans',
'',
' es',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, AFAIK it should be invalid in this case

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brendan's comments look addressed so i'm going to go ahead and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants