-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(preload): only allow same origin (domain + subdomains) #5065
Conversation
32a4b20
to
9d95eeb
Compare
sure thing no worries |
alright sorry for the conflicts @wardpeet, but this should be ready to rebase now! |
f9b3442
to
6c8c56d
Compare
* @param {string} urlA | ||
* @param {string} urlB | ||
*/ | ||
static rootDomainsMatch(urlA, urlB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests in url-shim-test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah forgot about that one :)
lighthouse-core/lib/url-shim.js
Outdated
} | ||
|
||
const urlARootDomain = urlAInfo.hostname.split('.').slice(-2).join('.'); | ||
const urlBRootDomain = urlBInfo.hostname.split('.').slice(-2).join('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Second-level_domain is a good starting point
lighthouse-core/lib/url-shim.js
Outdated
return false; | ||
} | ||
|
||
const urlARootDomain = urlAInfo.hostname.split('.').slice(-2).join('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is called "tld plus one". let's use that term for the method name and here.
let's add a few whitelisted exceptions like '.co.uk'. (country-code second level domains).
lighthouse-core/lib/url-shim.js
Outdated
@@ -17,6 +17,24 @@ const Util = require('../report/html/renderer/util.js'); | |||
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) || | |||
require('url').URL; | |||
|
|||
const tldPlusOne = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's hardcode just the keys of this list (and add .co.uk)
add comment saying that https://publicsuffix.org/ is the real source of this data. and hypothetically we could pull in a dep like https://npms.io/search?q=public+suffix+list to do this forreal, but at significant byte cost.
and admit we're taking the lazy way out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared a sheet with you on the most popular 2-level TLDs, but summary: if we did a regex with /.(co|com|org|edu|gov).TLD$/ that should cover most cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new list to the regex coming from http archive. Not sure how I should comment it in the code. Also i'm not matching all 1000 entries
a66ea70
to
a681ce7
Compare
lighthouse-core/lib/url-shim.js
Outdated
@@ -17,6 +17,13 @@ const Util = require('../report/html/renderer/util.js'); | |||
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) || | |||
require('url').URL; | |||
|
|||
const tldPlusOneRegex = new RegExp(`\\.(com|co|gov|org|ac|or|edu|go|nic|in|gouv|gob|web|spb|net|` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can trim down this list a bit based on the coverage percentages in that sheet, doing just the top 10 already gives us 94% coverage
com 2321232
co 1714813
gov 435538
edu 304490
ac 222746
org 202231
go 82772
gob 73721
or 72214
net 43587
----------------
sum 5473344
total 5841097
coverage ~94%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @wardpeet !
lighthouse-core/lib/url-shim.js
Outdated
static getTld(hostname) { | ||
const tld = hostname.split('.').slice(-1)[0]; | ||
|
||
const tldRegex = new RegExp(`${tldPlusOneRegex.source}\\.${tld}$`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just do the .split
and a tlds.has
instead :)
Updated not sure this is what you had in mind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me! thanks @wardpeet!! 👏 👏
(got some lint things to fix it seems)
@@ -50,14 +50,18 @@ describe('Performance: uses-rel-preload audit', () => { | |||
|
|||
it('should suggest preload resource', () => { | |||
const rootNode = buildNode(1, 'http://example.com'); | |||
const mainDocumentNode = buildNode(2, 'http://www.example.com'); | |||
const mainDocumentNode = buildNode(2, 'http://www.example.com:3000'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're gonna need some graph creation helpers soon to make all this stuff :)
}, | ||
}; | ||
|
||
const mainResource = Object.assign({}, defaultMainResource, { | ||
url: 'http://www.example.com', | ||
url: 'http://www.example.com:3000', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just to exercise extra hostname checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's to make sure that origins with a port number are handled correct by the rootDomainMatch
@@ -131,7 +165,7 @@ describe('Performance: uses-rel-preload audit', () => { | |||
return UsesRelPreload.audit(mockArtifacts(networkRecords, chains, mainResource), {}).then( | |||
output => { | |||
assert.equal(output.rawValue, 1250); | |||
assert.equal(output.details.items.length, 1); | |||
assert.equal(output.details.items.length, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe overkill but assert the URL matches too?
lighthouse-core/lib/url-shim.js
Outdated
@@ -17,6 +17,10 @@ const Util = require('../report/html/renderer/util.js'); | |||
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) || | |||
require('url').URL; | |||
|
|||
const listOfTlds = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that somewhat documents what happened here re: coverage and whatnot? maybe just link to the best comment thread on this PR :)
so I just realized in #5471, we don't want to hide URLs based on if they were 3rd party, we want to hide them if their initiator was a 3rd party 🤔 nearly all of this should still apply, but I'll prob make an update soon :) |
Fixes nr 1 of #4425
Restrict results to same-origin (as a rough filter for predictable and unchanging URLs). Same origin as in subdomains and rootdomains.