Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: bail if encounter insecure ssl cert, to avoid hanging forever #6300

Merged
merged 13 commits into from
Oct 19, 2018

Conversation

connorjclark
Copy link
Collaborator

After encountering an SSL error, some commands to the Network/Page domains either took awhile to timeout, or hung forever (only seemed to happen for ERR_CERT_SYMANTEC_LEGACY). You can see this at https://ilkayuyarkaba.av.tr/tag/istifaya-zorlanan-iscinin-kidem-tazminati (use canary).

If one of the passes generates an SSL error, just bail. A report is generated with a useful error message at the top.

Fixes #6287.

getSecurityState(timeout = 1000) {
return new Promise((resolve, reject) => {
const err = new LHError(LHError.errors.SECURITY_STATE_TIMEOUT);
const asyncTimeout = setTimeout((_ => reject(err)), timeout);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will eventually be replaced by #6296

@@ -165,6 +166,20 @@ class GatherRunner {
} else if (mainRecord.hasErrorStatusCode()) {
errorDef = {...LHError.errors.ERRORED_DOCUMENT_REQUEST};
errorDef.message += ` Status code: ${mainRecord.statusCode}.`;
} else if (!mainRecord.finished) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, an insecure security state will always have .finished set to false on the network records. I wanted to avoid calling these Security commands if not needed.

@@ -313,7 +333,7 @@ class GatherRunner {
}

// Resolve on tracing data using passName from config.
return passData;
return [passData, pageLoadError];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about attaching it to passData instead? It's a relatively grab-bag object of what a happened during a pass, so it makes some sense :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll add it as an optional property. I avoided it in the first place b/c I didn't want to modify all usages of this type, but I completely forgot about optional properties :P

this.sendCommand('Security.enable');
this.once('Security.securityStateChanged', (e) => {
clearTimeout(asyncTimeout);
resolve(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e => state ?

this.once('Security.securityStateChanged', (e) => {
clearTimeout(asyncTimeout);
resolve(e);
this.sendCommand('Security.disable').catch(reject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this catch(). The timeout one is fine.

@@ -165,6 +166,20 @@ class GatherRunner {
} else if (mainRecord.hasErrorStatusCode()) {
errorDef = {...LHError.errors.ERRORED_DOCUMENT_REQUEST};
errorDef.message += ` Status code: ${mainRecord.statusCode}.`;
} else if (!mainRecord.finished) {
// could be security error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be just certificate errors. Seems like most of the connection errors are mainRecord.failed === true.

@@ -165,6 +166,20 @@ class GatherRunner {
} else if (mainRecord.hasErrorStatusCode()) {
errorDef = {...LHError.errors.ERRORED_DOCUMENT_REQUEST};
errorDef.message += ` Status code: ${mainRecord.statusCode}.`;
} else if (!mainRecord.finished) {
// could be security error
const securityState = await driver.getSecurityState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit but I think it'd be slightly better to collect this on L288 and pass that into getPageLoadError() rather than all of driver.

.filter(exp => exp.securityState === 'insecure')
.map(exp => exp.description)
.join(' ');
errorDef.message += ` Insecure: ${insecureDescriptions}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree the explanation gathering is the best we can do.
For this line i think it could just be

Suggested change
errorDef.message += ` Insecure: ${insecureDescriptions}`;
errorDef.message += ` ${insecureDescriptions.join(' ')}`;

@@ -410,7 +430,8 @@ class GatherRunner {
await driver.setThrottling(options.settings, passConfig);
await GatherRunner.beforePass(passContext, gathererResults);
await GatherRunner.pass(passContext, gathererResults);
const passData = await GatherRunner.afterPass(passContext, gathererResults);
const [passData, pageLoadError] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendankenny @patrickhulce either of you have an idea on how else to deliver this error?

see also the L462 here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pageLoadError.fatal = true; throw pageLoadError; :P

@@ -11,7 +11,9 @@ module.exports = {
badTraceRecording: `Something went wrong with recording the trace over your page load. Please run Lighthouse again.`,
pageLoadTookTooLong: `Your page took too long to load. Please follow the opportunities in the report to reduce your page load time, and then try re-running Lighthouse.`,
pageLoadFailed: `Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests.`,
pageLoadFailedInsecure: `The URL you have provided does not have a valid security certificate.`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm technically not just certificate, but admittedly most other security errors fall through the ERRORED_DOC_REQ path right now. how about

... does not have valid security credentials.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice meaty change, awesome!

const asyncTimeout = setTimeout((_ => reject(err)), timeout);

this.sendCommand('Security.enable');
this.once('Security.securityStateChanged', (e) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat tangential...

@paulirish are we just paranoid for doing all our listeners before enabling or do we actually need to? I seem to remember dgozman scolding me for not trusting in JS microtasks when we had listeners before .enable :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good call.

This code here would be a problem if there was an await on the .enable call above. which seems fairly reasonable.

So yeah +1 to defining the once handler before we flip enable() on

Copy link
Collaborator Author

@connorjclark connorjclark Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, fixed.

I don't quite understand the execution model here. It's clear to me why awaiting the enable would schedule registering the listener much too late, but I don't understand how this original code worked (enable without awaiting, then register a listener via .once). Why does enabling the Security domain still occur after the listener is registered?

(note, this is literally all I know about micro/macrotasks: https://stackoverflow.com/a/25933985 )

const err = new LHError(LHError.errors.SECURITY_STATE_TIMEOUT);
const asyncTimeout = setTimeout((_ => reject(err)), timeout);

this.sendCommand('Security.enable');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, @paulirish is Security domain still broken on Android, is this going to break us on real devices again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they fixed android like a year ago! yay

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well someone's behind the times 😆

@@ -313,7 +333,7 @@ class GatherRunner {
}

// Resolve on tracing data using passName from config.
return passData;
return [passData, pageLoadError];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about attaching it to passData instead? It's a relatively grab-bag object of what a happened during a pass, so it makes some sense :)

@@ -165,6 +166,20 @@ class GatherRunner {
} else if (mainRecord.hasErrorStatusCode()) {
errorDef = {...LHError.errors.ERRORED_DOCUMENT_REQUEST};
errorDef.message += ` Status code: ${mainRecord.statusCode}.`;
} else if (!mainRecord.finished) {
// could be security error
const securityState = await driver.getSecurityState();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: WDYT about sprinkling some nice destructuring here, const {securityState, explanations} =? somethin' about securityState.securityState just spoke to me :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even notice that :O

const err = new LHError(LHError.errors.SECURITY_STATE_TIMEOUT);
const asyncTimeout = setTimeout((_ => reject(err)), timeout);

this.once('Security.securityStateChanged', state => {
Copy link
Member

@brendankenny brendankenny Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: does this check break if someone else has been listening to the Security domain? It would be nice to get a non-event based version of this in the protocol (AFAIK not much we can do to guard against that case for now).

Copy link
Collaborator Author

@connorjclark connorjclark Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enabled just before calling this function, and yes, it does break. It times out + rejects with SECURITY_STATE_TIMEOUT (which actually kills Lighthouse. oops)

Besides other users enabling the Security domain via code, could this fail if a developer opens the Security tab (this enables the domain) before running an audit?

resolve(state);
this.sendCommand('Security.disable');
});
this.sendCommand('Security.enable');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to be a general utility function, it will need to disable the domain and remove the event listener in the rejection case. Otherwise we'll need to think how to make rejections always lead to a program exit, as the state of things might get weird (we'll have to deal with the same thing in #6296)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this to be a general utility function, it will need to disable the domain and remove the event listener in the rejection case.

Maybe this is overkill. If it's timing out (unexpectedly), something is pretty wrong and LH probably won't recover. Disabling the domain also might not even work. OTOH, not cleaning up feels wrong :)

@@ -271,7 +278,9 @@ class GatherRunner {
const networkRecords = NetworkRecorder.recordsFromLogs(devtoolsLog);
log.verbose('statusEnd', status);

let pageLoadError = GatherRunner.getPageLoadError(passContext.url, networkRecords);
const securityState = await driver.getSecurityState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we're making this harder than it needs to be by trying to unify with with the other page load errors.

getPageLoadError is trying really hard to not interrupt normal control flow, inserting the promise rejection into the gather results instead of into the executing chain. Meanwhile, this security check is trying to abandon the gathering altogether.

Maybe this should just be a separate function, checkPageSecurityState() or whatever, called here-ish. If the page is insecure, throw in there. It'll bail back to here, which will bail up to the catch in run(). If we really want to still return something in that case, the check against LHError.errors.INSECURE_DOCUMENT_REQUEST.code can happen there (or rethrow if it's something different).

This also gets around extending passData, which makes sense, because really there isn't any other useful pass data when there's this kind of error.

WDYT?

Copy link
Collaborator Author

@connorjclark connorjclark Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to utilize error throwing to simplify this. But, I can't quite figure out how to still display a LHR in run's catch. I marked the security LHError fatal (so it would throw), moved baseArtifacts up a bit (so it can be accessed in catch), but get this silly error:

Error: CSSUsage failed to provide an artifact.
    at Function.collectArtifacts (/Users/cjamcl/src/lighthouse/lighthouse-core/gather/gather-runner.js:366:15)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:189:7)

@connorjclark
Copy link
Collaborator Author

connorjclark commented Oct 17, 2018

OK, I believe I resolved all comments. Opened #6330 to discuss how to handle errors being thrown during gather-runner + move this issue forward.

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I like this approach

*/
getSecurityState() {
// @ts-ignore
return this.lastSecurityState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should set this in the Driver constructor (I guess to null since SecurityStateChangedEvent is a relatively large object so it's probably not worth constructing an unknown starting value). Won't need to ts-ignore then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ts-ignored to coerce a non-nullable value, since I believe this should always be set. I could remove it and allow this getter to be nullable, but would also need to handle that in the calling code (a case that I think will never happen). wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we believe it's always set but don't know for sure, WDYT about throwing an explicit error if it's not?

This way it'll be clear to us when something unexpected happens and ts is happy by default :)

@@ -112,6 +112,7 @@ class GatherRunner {
await driver.cacheNatives();
await driver.registerPerformanceObserver();
await driver.dismissJavaScriptDialogs();
await driver.listenForSecurityStateChanges();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing it this way, I guess we'll find out if there are issues with listening through the whole page load :)

But it does nicely pave the way for bailing earlier if we want to do that.

.filter(exp => exp.securityState === 'insecure')
.map(exp => exp.description);
errorDef.message += ` ${insecureDescriptions.join(' ')}`;
return new LHError(errorDef, {fatal: true});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than returning the error and throwing on the return value, I'd say just throw in here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{fatal: true} shouldn't be necessary anymore

Copy link
Collaborator Author

@connorjclark connorjclark Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can do that. I was thinking it's nicer to make tests that check return values rather than throwing conditions, but just my preference, I could have it either way.

I'll remove fatal. yay, that means the resolve or throw thing really can still be removed. It wasn't being used any way, doh.

@@ -172,6 +173,22 @@ class GatherRunner {
}
}

/**
* Returns an error if the original network request failed or wasn't found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update string

* @param {LH.Crdp.Security.SecurityStateChangedEvent} securityState
* @return {LHError|undefined}
*/
static checkForSecurityIssue({securityState, explanations}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed: if it does throw in here, maybe rename assertCertificateError or something? Or does this catch a larger class of issues

Copy link
Collaborator Author

@connorjclark connorjclark Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It catches anytime the latest security state is marked insecure. a certificate error is one example. I don't know if it's just one of, most of them, or all possible scenarios.

Copy link
Collaborator

@patrickhulce patrickhulce Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just triple checking because the protocol name is getting me concerned a bit, but securityState === 'insecure' only when the page is HTTPS but actually isn't secure, right? Not just every time the page is served over HTTP without encryption? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the idea of throwing in here with a assert* rename too, btw. Seems much easier and removes the cognitive load from the caller

Copy link
Collaborator Author

@connorjclark connorjclark Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to check HTTP. Tested with node lighthouse-cli http://www.httpvshttps.com/ --verbose --view - it still generated a full LHR. I presume "insecure" is only used for HTTPS connections that don't comply with security standards (and I guess they always go hand-in-hand with interstitial security warnings).

@@ -288,7 +310,6 @@ class GatherRunner {
trace,
};

// Disable throttling so the afterPass analysis isn't throttled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore?

@@ -169,6 +175,12 @@ const ERRORS = {
message: strings.requestContentTimeout,
},

// Protocol timeout failures
SECURITY_STATE_TIMEOUT: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the usual case for the security state check, it should just reject on an insecure security state, right? If so, we probably want to make this a more general protocol communication timeout error (anticipating #6296)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to modify that bit in the referenced issue. but now that we have this concrete proto definition it makes sense to make it good sooner rather than later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I actually removed the timeout stuff for security checking. I'll remove this too.

@@ -543,9 +545,9 @@ describe('GatherRunner', function() {
],
};

return GatherRunner.afterPass({url, driver, passConfig}, {TestGatherer: []}).then(vals => {
return GatherRunner.afterPass({url, driver, passConfig}, {TestGatherer: []}).then(passData => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to revert these lines now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passData is more clear than vals.

@brendankenny
Copy link
Member

Just triple checking because the protocol name is getting me concerned a bit, but securityState === 'insecure' only when the page is HTTPS but actually isn't secure, right?

Looking through the chromium source, the values to search for are blink::kWebSecurityStyleInsecure and Security::SecurityStateEnum::Insecure (that's what ends up setting the field in the event payload in security_handler.cc).

There are two ways it's set. If someone wants to track down which one is actually in our execution path, feel free :)

if (net::IsCertStatusError(cert_status) &&
    !net::IsCertStatusMinorError(cert_status)) {
  return blink::kWebSecurityStyleInsecure;
}

This is per-resource, not per-page, so probably not it.

switch (security_level) {
  // ...
  case security_state::DANGEROUS:
    return blink::kWebSecurityStyleInsecure;
  // ..
}

where DANGEROUS is

// Attempted HTTPS and failed, page not authenticated, HTTPS with
// insecure active content on the page, malware, phishing, or any other
// serious security issue that could be dangerous.

Either way, this assumption seems to be ok. The main extra thing seems to be malware/phishing. Presumably that information will be in the explanation put in the friendly messsage.

@brendankenny
Copy link
Member

HTTPS with insecure active content on the page

oh, mixed content might be an issue as well? We should check. I'm not sure we want to throw in that case

@connorjclark
Copy link
Collaborator Author

HTTPS with insecure active content on the page

oh, mixed content might be an issue as well? We should check. I'm not sure we want to throw in that case

node lighthouse-cli https://www.bennish.net/mixed-content.html --verbose --view works OK, so mixed content should be fine.

@brendankenny
Copy link
Member

yeah, it's flagged as "neutral". Mixed content pages with passwords might trigger this (cf. https://crbug.com/647754), but it seems it will only require a error message tweak in the future if folks run into it

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking good. Just a few last things

@@ -679,6 +681,43 @@ describe('GatherRunner', function() {
});
});

describe('#checkForSecurityIssue', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#assertNoSecurityIssues

@@ -849,6 +849,26 @@ class Driver {
});
}

async listenForSecurityStateChanges() {
this.on('Security.securityStateChanged', state => {
this.lastSecurityState = state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type checker now allows this (in JS files as of 3.0 or 3.1), but should still declare it in the constructor so there's a clear place to check for what's on Driver, we don't dynamically mutate the object's shape, etc

Copy link
Collaborator Author

@connorjclark connorjclark Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this really surprised me, how it was able to infer the type. it was also able to infer it could be undefined, since the ctor didn't set it.

Makes sense. This is not good:
image

I guess TSC just infers a permissive union type for this.property, based on all setters? What new feature of tsc 3.0/3.1 is this?

@@ -172,6 +173,22 @@ class GatherRunner {
}
}

/**
* Returns an error if the security state is insecure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws

GatherRunner.assertNoSecurityIssues(insecureSecurityState);
assert.fail('expected INSECURE_DOCUMENT_REQUEST LHError');
} catch (err) {
assert.equal(err.message, 'INSECURE_DOCUMENT_REQUEST');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check err.code as well/instead, since that's what we use for the top level runtimeError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Some other tests were just testing .message, so I added checks there too. lmk if I should just remove the .message assertions.

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@brendankenny brendankenny merged commit 42091b2 into master Oct 19, 2018
@brendankenny brendankenny deleted the issue-6287-handle-hanging-cert branch October 19, 2018 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants