Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(preload): only allow same origin (domain + subdomains) #5065

Merged
merged 7 commits into from
Jun 6, 2018

Conversation

wardpeet
Copy link
Collaborator

@wardpeet wardpeet commented Apr 29, 2018

Fixes nr 1 of #4425
Restrict results to same-origin (as a rough filter for predictable and unchanging URLs). Same origin as in subdomains and rootdomains.

@patrickhulce
Copy link
Collaborator

patrickhulce commented Apr 30, 2018

@wardpeet FYI I had to rework considerable portions of this for lantern (#5062, would love your review!), wdyt about holdin' tight for a cycle to get that in first? :)

@wardpeet
Copy link
Collaborator Author

wardpeet commented May 2, 2018

sure thing no worries

@patrickhulce
Copy link
Collaborator

alright sorry for the conflicts @wardpeet, but this should be ready to rebase now!

* @param {string} urlA
* @param {string} urlB
*/
static rootDomainsMatch(urlA, urlB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests in url-shim-test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah forgot about that one :)

}

const urlARootDomain = urlAInfo.hostname.split('.').slice(-2).join('.');
const urlBRootDomain = urlBInfo.hostname.split('.').slice(-2).join('.');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false;
}

const urlARootDomain = urlAInfo.hostname.split('.').slice(-2).join('.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called "tld plus one". let's use that term for the method name and here.

let's add a few whitelisted exceptions like '.co.uk'. (country-code second level domains).

@paulirish paulirish changed the title Only allow same origin (domain + subdomains) for preload core(preload): only allow same origin (domain + subdomains) May 18, 2018
@paulirish paulirish changed the title core(preload): only allow same origin (domain + subdomains) only allow same origin (domain + subdomains) May 18, 2018
@paulirish paulirish changed the title only allow same origin (domain + subdomains) core(preload): only allow same origin (domain + subdomains) May 18, 2018
@paulirish paulirish changed the title core(preload): only allow same origin (domain + subdomains) core(preload): only allow same origin (domain + subdomains). May 18, 2018
@paulirish paulirish changed the title core(preload): only allow same origin (domain + subdomains). core(preload): only allow same origin (domain + subdomains).. May 18, 2018
@paulirish paulirish changed the title core(preload): only allow same origin (domain + subdomains).. core(preload): only allow same origin (domain + subdomains) May 18, 2018
@@ -17,6 +17,24 @@ const Util = require('../report/html/renderer/util.js');
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) ||
require('url').URL;

const tldPlusOne = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's hardcode just the keys of this list (and add .co.uk)

add comment saying that https://publicsuffix.org/ is the real source of this data. and hypothetically we could pull in a dep like https://npms.io/search?q=public+suffix+list to do this forreal, but at significant byte cost.

and admit we're taking the lazy way out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared a sheet with you on the most popular 2-level TLDs, but summary: if we did a regex with /.(co|com|org|edu|gov).TLD$/ that should cover most cases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new list to the regex coming from http archive. Not sure how I should comment it in the code. Also i'm not matching all 1000 entries

@wardpeet wardpeet force-pushed the feature/preload-subdomains branch from a66ea70 to a681ce7 Compare May 30, 2018 20:25
@@ -17,6 +17,13 @@ const Util = require('../report/html/renderer/util.js');
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) ||
require('url').URL;

const tldPlusOneRegex = new RegExp(`\\.(com|co|gov|org|ac|or|edu|go|nic|in|gouv|gob|web|spb|net|` +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can trim down this list a bit based on the coverage percentages in that sheet, doing just the top 10 already gives us 94% coverage

com       2321232
co        1714813
gov       435538
edu       304490
ac        222746
org       202231
go        82772
gob       73721
or        72214
net       43587
----------------
sum       5473344
total     5841097
coverage  ~94%

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also just go to top 15, everything beyond that is <10000 occurrences
image

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @wardpeet !

static getTld(hostname) {
const tld = hostname.split('.').slice(-1)[0];

const tldRegex = new RegExp(`${tldPlusOneRegex.source}\\.${tld}$`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just do the .split and a tlds.has instead :)

@wardpeet
Copy link
Collaborator Author

wardpeet commented Jun 1, 2018

Updated not sure this is what you had in mind

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me! thanks @wardpeet!! 👏 👏

(got some lint things to fix it seems)

@@ -50,14 +50,18 @@ describe('Performance: uses-rel-preload audit', () => {

it('should suggest preload resource', () => {
const rootNode = buildNode(1, 'http://example.com');
const mainDocumentNode = buildNode(2, 'http://www.example.com');
const mainDocumentNode = buildNode(2, 'http://www.example.com:3000');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're gonna need some graph creation helpers soon to make all this stuff :)

},
};

const mainResource = Object.assign({}, defaultMainResource, {
url: 'http://www.example.com',
url: 'http://www.example.com:3000',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just to exercise extra hostname checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's to make sure that origins with a port number are handled correct by the rootDomainMatch

@@ -131,7 +165,7 @@ describe('Performance: uses-rel-preload audit', () => {
return UsesRelPreload.audit(mockArtifacts(networkRecords, chains, mainResource), {}).then(
output => {
assert.equal(output.rawValue, 1250);
assert.equal(output.details.items.length, 1);
assert.equal(output.details.items.length, 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe overkill but assert the URL matches too?

@@ -17,6 +17,10 @@ const Util = require('../report/html/renderer/util.js');
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) ||
require('url').URL;

const listOfTlds = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that somewhat documents what happened here re: coverage and whatnot? maybe just link to the best comment thread on this PR :)

@patrickhulce patrickhulce merged commit d385645 into master Jun 6, 2018
@patrickhulce patrickhulce deleted the feature/preload-subdomains branch June 6, 2018 16:39
@patrickhulce
Copy link
Collaborator

so I just realized in #5471, we don't want to hide URLs based on if they were 3rd party, we want to hide them if their initiator was a 3rd party 🤔 nearly all of this should still apply, but I'll prob make an update soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants