Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Valid HTTPS' key-value inconsistent across platforms #149

Open
refayathaque opened this issue Jan 12, 2018 · 17 comments
Open

'Valid HTTPS' key-value inconsistent across platforms #149

refayathaque opened this issue Jan 12, 2018 · 17 comments
Assignees
Labels
bug This issue or pull request addresses broken functionality

Comments

@refayathaque
Copy link

We are utilizing the pshtt module to determine M-15-13 compliance for certain websites. We are running pshtt off of a python script that is invoking the 'inspect_domains' method to get all relevant results. As part of our testing we have been running the same method in multiple places, namely our local machine and our cloud instances (the pshtt versions are the same on both), additionally, we are also running tests by calling 'pshtt' directly from bash. In all three examples, we are seeing different results for a couple of specific 'key-value' pairs. Provided below is one example of the issues we are facing.

www.worklife4you.com - for this domain we are seeing three different Boolean values for 'Valid HTTPS'.

  • 'pshtt.inspect_domain' method in a python script running locally returns 'None' for 'Valid HTTPS'.
  • running pshtt directly off the bash CLI returns 'False' for 'Valid HTTPS'.
  • running the scan from our cloud instance returns 'True' for 'Valid HTTPS'.
    • What's really strange about this is that it's the same 'pshtt.inspect_domains' method we are running locally, in this application, it's just wrapped in an EC2 instance. The pshtt version is also up-to-date in the cloud (v.0.3.0) and is the same version as in our local machine (v.0.3.0)

Thank you so much for helping us out with this.

@jsf9k jsf9k self-assigned this Jan 12, 2018
@jsf9k jsf9k added the bug This issue or pull request addresses broken functionality label Jan 12, 2018
@konklone
Copy link
Collaborator

Running off of the CLI (pshtt --version says 0.3.0) with either worklife4you.com or www.worklife4you.com gives me a null value in the resulting JSON,

pshtt worklife4you.com -d -j
[
  {
    "Base Domain": "worklife4you.com",
    "Base Domain HSTS Preloaded": false,
    ...
    "Valid HTTPS": null,

Though when I use the CLI and have it output in CSV mode, I get False for the Valid HTTPS column:

pshtt worklife4you.com -d
# ...
cat results.csv
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False

When running this from the Python API in ipython (where pshtt.__version__ says 0.3.0), I get a value of None in the resulting dict:

In [13]: pshtt.inspect_domains(["worklife4you.com"], {})

Out[13]: 
[{'Base Domain': 'worklife4you.com',
  'Base Domain HSTS Preloaded': False,
  'Canonical URL': 'https://worklife4you.com',
   ...
  'Valid HTTPS': None,

In the latest git-versioned pshtt, None values are supposed to get converted to False for all but a few non-boolean fields:

https://github.com/dhs-ncats/pshtt/blob/develop/pshtt/pshtt.py#L139-L148

    for header in HEADERS:
        if header in ("HSTS Header", "HSTS Max Age", "Redirect To"):
            continue

        if result[header] is None:
            result[header] = False

But previously in 0.3.0, the behavior was to only apply this change to CSV output. The commit that changed this was a44ab68 and on October 21, 2017, but it wasn't merged in in #125 until October 24th, the day after 0.3.0 was published.

@konklone
Copy link
Collaborator

konklone commented Jan 14, 2018

@refayathaque Given this, I think you're seeing two issues:

  • The None/False distinction is because in 0.3.0, None only gets turned into False right before CSV serialization. This is fixed in the repository version. It's likely a good time for @h-m-f-t to publish an update to PyPi, but you can also fix this locally by pulling from the git repository (which I do).

  • Valid HTTPS is false because, in your local (and my local) environment, the canonical URL is being detected as https://worklife4you.com, in part because http://worklife4you.com redirects there. And https://worklife4you.com doesn't have a valid cert (it's only valid for the www subdomain, not the root hostname). I suspect that your cloud vantage point (which you say shows you Valid HTTPS as True) is actually seeing different server behavior for some reason, potentially in the redirects you're being served, possibly based on IP/firewall rules affecting the server of cloud provider you're scanning from.

If you can share a full JSON output of the scan results (pshtt worklife4you.com -d -j) from the cloud provider with a result of Valid HTTPS as true, and one from your local environment running the same command and showing different output as Valid HTTPS being null or false, we can take a look at what might be different between the two to show that result. There should be some difference in one of the fields shown in the JSON output, since they contain all of the data points used to calculate the eventual answers.

@jsf9k
Copy link
Member

jsf9k commented Jan 15, 2018

@refayathaque You are probably already aware of this, but you can install from the GitHub repo via pip like this:

pip install git+https://github.com/dhs-ncats/pshtt.git@develop

Thanks to @konklone for investigating this issue!

@jsf9k
Copy link
Member

jsf9k commented Jan 26, 2018

@refayathaque, are you still seeing this issue with the latest code from develop?

@refayathaque
Copy link
Author

Hi @jsf9k apologies but I wasn't notified when you and @konklone began to respond to my inquiry. I was only made aware of this over the weekend by a colleague. Thank you so much for your help, let me run the tests you two have recommended, and then I'll get back to you. @jsf9k I actually wasn't aware that you can do pip installs directly off of github, that's quite neat, I'll definitely need to try that out as well. However, in the past, we have encountered innumerable difficulties running the pshtt module in AWS Lambda. AWS Lambda, being essentially run in an Amazon Linux AMI, requires these very specific .so files for the pshtt, and all its supporting modules, to run. Getting these .so files is a nightmare and requires us to 'build from source', something my junior developer repertoire lacks.

@jsf9k
Copy link
Member

jsf9k commented Feb 12, 2018

@refayathaque, no worries.

Regarding running in AWS Lambda, if you want to run pshtt via 18F/domain-scan then you can leverage the Lambda work that @konklone has already done. You may also find dhs-ncats/lambda_functions useful if you need to build fresher Lambda zip files that what is committed to 18F/domain-scan.

@refayathaque
Copy link
Author

refayathaque commented Feb 22, 2018

@konklone getting back to you with the JSON objects you asked for.

The first is from our Lambda function running the pshtt scan (FYI we are NOT running pshtt www.worklife4you.com -d -j but we are running pshtt_results = pshtt.inspect_domains([url], {})[0] where url would be www.worklife4you.com)

"Pshtt": { "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": "False", "Canonical URL": "https://www.worklife4you.com", "Defaults to HTTPS": "True", "Domain": "www.worklife4you.com", "Domain Enforces HTTPS": "False", "Domain Supports HTTPS": "False", "Domain Uses Strong HSTS": "True", "Downgrades HTTPS": "True", "HSTS": "True", "HSTS Entire Domain": "True", "HSTS Header": "max-age=31536000; includeSubDomains", "HSTS Max Age": "31536000", "HSTS Preload Pending": "False", "HSTS Preload Ready": "None", "HSTS Preloaded": "False", "HTTPS Bad Chain": "None", "HTTPS Bad Hostname": "None", "HTTPS Expired Cert": "None", "HTTPS Self Signed Cert": "None", "Live": "True", "Redirect": "False", "Redirect To": "None", "Strictly Forces HTTPS": "True", "Unknown Error": "False", "Valid HTTPS": "True" }

And here is what is being return in my terminal after running pshtt www.worklife4you.com -d -j

{ "Base Domain": "worklife4you.com", "Base Domain HSTS Preloaded": false, "Canonical URL": "https://worklife4you.com", "Defaults to HTTPS": true, "Domain": "worklife4you.com", "Domain Enforces HTTPS": false, "Domain Supports HTTPS": false, "Domain Uses Strong HSTS": null, "Downgrades HTTPS": false, "HSTS": false, "HSTS Entire Domain": null, "HSTS Header": null, "HSTS Max Age": null, "HSTS Preload Pending": false, "HSTS Preload Ready": false, "HSTS Preloaded": false, "HTTPS Bad Chain": false, "HTTPS Bad Hostname": true, "HTTPS Expired Cert": false, "HTTPS Self Signed Cert": false, "Live": true, "Redirect": false, "Redirect To": null, "Strictly Forces HTTPS": true, "Unknown Error": false, "Valid HTTPS": null }

You're absolutely correct about the CSV serialization. So if I run just pshtt www.worklife4you.com and check out the results.csv, I see that Valid HTTPS is False.

@refayathaque
Copy link
Author

@konklone I also just ran worklife4you.com (without the www.) in our Lambda function and what results is Valid HTTPS is None 😕

@jsf9k
Copy link
Member

jsf9k commented Feb 22, 2018

@refayathaque, are you using the lambda zip in the domain-scan repo? I don't think that zip has been updated in a while. You can use dhs-ncats/lambda_functions to build a new zip for pshtt.

When I run in lambda using a zip I recently built, I get these (admittedly difficult to read - apologies for that) results:

$ ./scan --scan=pshtt --lambda worklife4you.com
[pshtt] Downloading third party data...
[worklife4you.com][pshtt] Running scan...
        Executing Lambda scan...
Results written to CSV.
$ less results/pshtt.csv 
Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error
worklife4you.com,worklife4you.com,https://worklife4you.com,True,False,,False,True,False,True,False,True,False,False,False,,,False,False,False,False,False,False,False,False,False

Note that Valid HTTPS is False, not None.

@jsf9k
Copy link
Member

jsf9k commented Feb 22, 2018

@refayathaque ah, nevermind, it looks like you built your own zip. I should read more carefully. :)

@refayathaque
Copy link
Author

refayathaque commented Feb 22, 2018

@jsf9k thanks for getting back! Yes, we built our own zip file and pushed the deployment package up to Lambda. I am now experimenting with the latest code from the pshtt repo (did pip install git+https://github.com/dhs-ncats/pshtt.git@develop), and I created a local package (which I hope to push up to Lambda and test later), but our pshtt.inspect_domains([url], {})[0] invokation from before isn't working. We get the error TypeError: 'generator' object has no attribute '__getitem__' . Not sure what could be happening here. Do you think they changed the method for invoking pshtt scans from within a .py file?

pshtt.inspect_domains([url], {})[0] - Has this changed?

@jsf9k
Copy link
Member

jsf9k commented Feb 26, 2018

@refayathaque you need to add a line like this to trigger the work. This changed about four months ago, and pshtt.inspect_domains([url], {}) is now a generator.

@refayathaque
Copy link
Author

refayathaque commented Feb 26, 2018

@jsf9k thanks for getting back. We will test this once we get a chance, but before we do, a couple of questions.

results = list(results)
^
Where is list defined? Are we importing this from pshtt as well?

return results[0]
^
Is it compulsory for us to return results[0]? In that case, we will need to take this out of our handler and create a separate scan function like what you have. results[0] I'm assuming is basically the return object with all relevant scan data? In essence what we've been recieving as the return dictionary?

Thank you so much for all your help!

@konklone
Copy link
Collaborator

@refayathaque list is a built-in Python function, it forces a Python iterator (which is what results is when it's returned from pshtt) to evaluate the entire iterator and convert it into a full list of items.

@jsf9k
Copy link
Member

jsf9k commented Feb 26, 2018

@refayathaque Once you do list(results) you will have a Python list of results like you were expecting from the old code. You can return the entire thing, take the first one, or do whatever you want with it.

@refayathaque
Copy link
Author

Hi @konklone and @jsf9k, thank you once again for guiding us on how to use the most recent version of the module, we pip installed directly off the repo and used the new scan function invocation. We are now running our scans off the repo, and we seem to be getting the same results as before, at least for three test cases, and we are a little perplexed by the results. Allow me to elaborate.

  1. www.worklife4you.com - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, BUT Supports_HTTPS : False - this isn't making sense to us, if Defaults_to_HTTPS and Strictly_Forces_HTTPS are both True, then surely Supports_HTTPS should be True as well.

    1. worklife4you.com - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - the data here is consistent but because the certificate is bad (SSLyze part of pshtt returning an 'error validating certificate' message) can the scan result not be trusted?
  2. www.buprenorphine.samhsa.gov AND buprenorphine.samhsa.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : False, Supports_HTTPS : False - data here is consistent with expectations, exhibiting that pshtt works well for some websites. (No certificate errors for both url and domain)

  3. www.aoa.acl.gov - Defaults_to_HTTPS : False, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - this also doesn't make sense to us, how can both Defaults_to_HTTPS and Supports_HTTPS be False when Strongly_Forces _HTTPS is True? We would be remiss if we didn't mention that this scan also resulted in an 'error validating certificate', and as result of this can the result not be trusted?

    1. aoa.acl.gov curiously, results in a slightly different scan outcome - Defaults_to_HTTPS : True, Strictly_Forces_HTTPS : True, Supports_HTTPS : False - again, this makes no sense, it defaults to HTTPS but does not support and force HTTPS? Are we getting these results because this scan also resulted in an 'error validating certificate'?

Thank you!

@konklone
Copy link
Collaborator

@refayathaque -

  1. For worklife4you.com, you should get (and I do get) the same results whether you use www or not. pshtt treats those inputs as identical. And for that host, I get False for all of the relevant fields. One key issue is that https://www.worklife4you.com redirects immediately to http://www.worklife4you.com/index.html, which is a downgrade and causes the domain to be flagged as not supporting HTTPS.

  2. Seems like this is working fine.

  3. The results for aoa.acl.gov look True across the board, in pshtt and on Pulse. Let us know if you see anything amiss.

Are you maybe using an old version of pshtt, before we started properly harmonizing inputs with or without www?

cisagovbot pushed a commit that referenced this issue Jul 30, 2024
…max/ghaction-github-status-4

Bump crazy-max/ghaction-github-status from 3 to 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue or pull request addresses broken functionality
Projects
None yet
Development

No branches or pull requests

3 participants