Re-architecting of CSV / MD / JSON writing #125

IanLee1521 · 2017-10-21T05:54:57Z

This resolves #28, by re-working the cli main function to write out each domain of data as it is generated. This works by converting the inspect_domains function yield the results of each domain inspection as they occur, rather than waiting until the very end of the generation.

This PR also simplifies the logic surrounding writing out the other formats (JSON and MD) by using a unified smart_open context manager to manage outputting to a file vs to stdout.

This re-works the cli main function to write out each domain of data as it is generated. This works by converting the `inspect_domains` function yield the results of each domain inspection as they occur, rather than waiting until the very end of the generation.

coveralls · 2017-10-21T05:56:19Z

Coverage decreased (-0.3%) to 23.338% when pulling 3ccc408 on gh-28-output-as-we-go into c99ea6a on master.

IanLee1521 · 2017-10-21T14:06:03Z

@h-m-f-t , it looks like you may have left coveralls configured to require only increases in coverage, but since this removes some code, it actually dropped the coverage.

Until we can get the testing framework more intact, maybe we should bump up the threshold for a pull request dropping the coverage to, maybe something like, 2%? 5%?

Should be able to be configured at: https://coveralls.io/github/dhs-ncats/pshtt/settings (though you as admin would need to do this)

coveralls · 2017-10-21T15:32:07Z

Coverage increased (+0.6%) to 24.228% when pulling 760dc18 on gh-28-output-as-we-go into c99ea6a on master.

coveralls · 2017-10-21T15:48:58Z

Coverage increased (+0.6%) to 24.228% when pulling 905b91f on gh-28-output-as-we-go into c99ea6a on master.

Done by ignoring the flake8 error when using Python2

coveralls · 2017-10-21T16:05:08Z

Coverage increased (+1.2%) to 24.846% when pulling 3225035 on gh-28-output-as-we-go into c99ea6a on master.

IanLee1521 · 2017-10-21T16:10:37Z

@h-m-f-t , For this particular case, I ended up just updating the pull request to add more tests along the way. :)

coveralls · 2017-10-21T16:11:24Z

Coverage increased (+1.2%) to 24.846% when pulling f99e244 on gh-28-output-as-we-go into c99ea6a on master.

This is the case when these functions are called directly, and where the `inspect_domains()` function is not used. Currently these lists are defined only if `inspect_domains()` has been called, which may not be the case when testing (e.g. in #28 / #125).

coveralls · 2017-10-21T17:42:55Z

Coverage increased (+21.2%) to 44.801% when pulling bf31668 on gh-28-output-as-we-go into c99ea6a on master.

IanLee1521 · 2017-10-21T18:12:33Z

Alright, I know I was doing a bunch of work after submitting this, but I'm done for a while. The last thing I would like to do with this is to add the testing of the to_json and to_markdown functions I factored out, but I can do that in this PR or a separate one if we get this merged sooner.

konklone

I made a a few small requests in the tests, to avoid hardcoding data/strings where possible. But this overall looks great.

Is there any exception handling present, or does this improve existing exception handling, so that an exception during parsing still results in incomplete data getting written to CSV?

konklone · 2017-10-22T16:46:18Z

tests/test_cli.py

+        with open(self.temp_filename) as fh:
+            content = fh.read()
+
+            expected = 'Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error\n'


Could we generate this by pulling in the global var and joining them with .? That would prevent us from having to update the test data every time a column header changes.

konklone · 2017-10-22T16:46:21Z

tests/test_cli.py

+            content = fh.read()
+
+            expected = ''
+            expected += 'Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error\n'


Same note as above.

konklone · 2017-10-22T17:26:19Z

tests/test_cli.py

+
+            expected = ''
+            expected += 'Domain,Base Domain,Canonical URL,Live,Redirect,Redirect To,Valid HTTPS,Defaults to HTTPS,Downgrades HTTPS,Strictly Forces HTTPS,HTTPS Bad Chain,HTTPS Bad Hostname,HTTPS Expired Cert,HTTPS Self Signed Cert,HSTS,HSTS Header,HSTS Max Age,HSTS Entire Domain,HSTS Preload Ready,HSTS Preload Pending,HSTS Preloaded,Base Domain HSTS Preloaded,Domain Supports HTTPS,Domain Enforces HTTPS,Domain Uses Strong HSTS,Unknown Error\n'
+            expected += 'example.com,example.com,http://example.com,False,False,,False,False,False,False,False,False,False,False,False,,,False,False,False,False,False,False,False,False,False\n'


I think this would be best managed in the test data as an array which is also joined by a comma in code before appending to expected. It still might require changing when the headers change, but it would be much more obvious and direct how to make that change.

(Replying to all three comments) Yeah, I actually started down that route originally. For the headers, as you say, using ','.join(_pshtt.HEADERS) is pretty simple, the issue for me then became how to convert the data itself, as doing a list of the values will make it tougher to map each back to the appropriate header...

Maybe creating a full domain object and setting all the values there would be best... Let me give that a shot.

as doing a list of the values will make it tougher to map each back to the appropriate header...

This is true, but it's probably an acceptable annoyance, IMO. I think it's likely to be less brittle than a concatenated string, and so just turning that into a static array would be enough for me for this PR.

konklone · 2017-10-22T17:27:23Z

Side question - @h-m-f-t, the Coveralls integration is a bit noisy. Is there a way to tune it down somewhat?

IanLee1521 · 2017-10-23T02:52:24Z

The coveralls output might be more my fault... It fires on every push, and since I'd kept working on this PR after creating it (which I probably shouldn't have) that's why it got a bit noisy).

@konklone

Instead uses a list of tuples for the columnar data. This feedback was provided by @konklone

coveralls · 2017-10-23T03:31:47Z

Coverage increased (+21.2%) to 44.801% when pulling 537f412 on gh-28-output-as-we-go into c99ea6a on master.

h-m-f-t

LGTM, though I'd also like @jsf9k to give a look.

jsf9k

Looks good to me.

One question, though...when I try to run the tests locally using the command line "tox" the flake8 output is overwhelming. (The tests do pass, though.) Am I running it correctly? I suspect user error on my part.

IanLee1521 · 2017-10-23T23:53:05Z

@jsf9k -- Huh, I don't see those... this is what I see:

[... snip ...]
flake8 installed: asn1crypto==0.23.0,beautifulsoup4==4.6.0,certifi==2017.7.27.1,cffi==1.11.2,chardet==3.0.4,cryptography==1.9,DataProperty==0.25.6,docopt==0.6.2,dominate==2.3.1,elasticsearch==5.4.0,flake8==3.4.1,idna==2.6,jsonschema==2.6.0,Logbook==1.1.0,markdown2==2.3.4,mbstrdecoder==0.2.2,mccabe==0.6.1,nassl==0.17.0,path.py==10.4,pathvalidate==0.16.2,pshtt==0.3.0.dev0,publicsuffix==1.1.0,pycodestyle==2.3.1,pycparser==2.18,pyflakes==1.5.0,pyOpenSSL==17.3.0,pyparsing==2.2.0,pytablereader==0.13.4,pytablewriter==0.24.0,python-dateutil==2.6.1,pytz==2017.2,requests==2.18.4,requests-cache==0.4.13,SimpleSQLite==0.16.0,six==1.11.0,SSLyze==1.1.4,tls-parser==1.1.0,toml==0.9.3,typepy==0.0.20,urllib3==1.22,wget==3.2,xlrd==1.1.0,XlsxWriter==1.0.2,xlwt==1.3.0
flake8 runtests: PYTHONHASHSEED='4263820095'
flake8 runtests: commands[0] | flake8
___________________________________________________________________________________ summary ____________________________________________________________________________________
  py27: commands succeeded
SKIPPED:  py34: InterpreterNotFound: python3.4
SKIPPED:  py35: InterpreterNotFound: python3.5
  py36: commands succeeded
  flake8: commands succeeded
  congratulations :)

Can you paste what you're seeing?

konklone · 2017-10-24T04:37:24Z

The coveralls output might be more my fault... It fires on every push, and since I'd kept working on this PR after creating it (which I probably shouldn't have) that's why it got a bit noisy).

Absolutely not -- filing a PR and then continuing to modify it in-place is a design feature of GitHub and a positive part of collaborative development. Please keep doing that! We should just make coveralls less noisy so as not to punish that workflow.

jsf9k · 2017-10-24T12:41:28Z

@IanLee1521, here is the output I get from tox.

IanLee1521 · 2017-10-24T14:44:15Z

@konklone does https://github.com/dhs-ncats/pshtt/pull/125/files#diff-09f983a1db5a7aca6102f2e2a260c88cR55 work better for you w.r.t. the static string versus something better?

IanLee1521 · 2017-10-24T14:54:17Z

@jsf9k -- It looks like the issue is that you have a virtualenv directory in your top level directory. Therefore flake8 is picking up those files... One way to solve this is #127. Does that work for you?

konklone · 2017-10-24T19:05:58Z

@konklone does https://github.com/dhs-ncats/pshtt/pull/125/files#diff-09f983a1db5a7aca6102f2e2a260c88cR55 work better for you w.r.t. the static string versus something better?

That works (I also would have been fine with a straight array with header names in the comments, but all good).

This is the case when these functions are called directly, and where the `inspect_domains()` function is not used. Currently these lists are defined only if `inspect_domains()` has been called, which may not be the case when testing (e.g. in #28 / #125).

This is the case when these functions are called directly, and where the `inspect_domains()` function is not used. Currently these lists are defined only if `inspect_domains()` has been called, which may not be the case when testing (e.g. in #28 / #125), or when using pshtt as a library. The current solution is to raise an exception if the initialize function is not called explicitly, which is now its own function to handle initializing all the third party data. This begins work towards a more complete solution for #99, and allows for the initialize function to be mocked / called separately when working in tests.

Add a security label

…for-codeql-workflow Add a diagnostics job to the CodeQL workflow

IanLee1521 added 6 commits October 20, 2017 22:10

Added contextmanager for writing to file or stdout

48d4d03

Moved debug message out of contextmanager

fccfe5a

Simplified and moved to_markdown logic into cli

da00e9b

Re-worked JSON outputting to use smart_open

f187fa1

Added writing out newline to end of json

3ccc408

IanLee1521 requested review from konklone and h-m-f-t October 21, 2017 05:54

IanLee1521 added 3 commits October 21, 2017 07:10

Moved smart_open function into utils

a4578ec

Added tests for smart_open util

6b4078b

Fixed flake8 issue with context manager

760dc18

Skip tests in Python 2 when using Python 3 isms

905b91f

IanLee1521 added 2 commits October 21, 2017 08:57

Fixed Python2 flake8 issue in test

1e7f52e

Done by ignoring the flake8 error when using Python2

Fixed bug in tox.ini that prevented test coverage results

3225035

Updated smart_open() docstring with pointer to StackOverflow

f99e244

Refactored cli outputting to enable testing

e852e81

IanLee1521 mentioned this pull request Oct 21, 2017

Fixes bugs where the globals are None #126

Merged

Added unittests around new to_csv()

bf31668

konklone requested changes Oct 22, 2017

View reviewed changes

Updated tests to not hardcode the data as strings

537f412

Instead uses a list of tuples for the columnar data. This feedback was provided by @konklone

h-m-f-t approved these changes Oct 23, 2017

View reviewed changes

h-m-f-t requested a review from jsf9k October 23, 2017 16:01

jsf9k approved these changes Oct 23, 2017

View reviewed changes

IanLee1521 mentioned this pull request Oct 24, 2017

Ignore a top level venv/ directory from flake8 #127

Merged

konklone approved these changes Oct 24, 2017

View reviewed changes

konklone merged commit b3cff8c into master Oct 24, 2017

konklone deleted the gh-28-output-as-we-go branch October 24, 2017 19:06

konklone mentioned this pull request Jan 14, 2018

'Valid HTTPS' key-value inconsistent across platforms #149

Open

cisagovbot pushed a commit that referenced this pull request Feb 8, 2023

Merge pull request #125 from cisagov/improvement/add-security-label

b7c0a75

Add a security label

cisagovbot pushed a commit that referenced this pull request Dec 19, 2023

Merge pull request #125 from cisagov/improvement/add-diagnostics-job-…

593f588

…for-codeql-workflow Add a diagnostics job to the CodeQL workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-architecting of CSV / MD / JSON writing #125

Re-architecting of CSV / MD / JSON writing #125

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 •

edited

Loading

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

IanLee1521 commented Oct 21, 2017

konklone left a comment

konklone Oct 22, 2017

konklone Oct 22, 2017

konklone Oct 22, 2017

IanLee1521 Oct 23, 2017

konklone Oct 24, 2017

konklone commented Oct 22, 2017

IanLee1521 commented Oct 23, 2017

coveralls commented Oct 23, 2017 •

edited

Loading

h-m-f-t left a comment

jsf9k left a comment

IanLee1521 commented Oct 23, 2017 •

edited

Loading

konklone commented Oct 24, 2017

jsf9k commented Oct 24, 2017

IanLee1521 commented Oct 24, 2017

IanLee1521 commented Oct 24, 2017

konklone commented Oct 24, 2017

Re-architecting of CSV / MD / JSON writing #125

Re-architecting of CSV / MD / JSON writing #125

Conversation

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 • edited Loading

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 • edited Loading

coveralls commented Oct 21, 2017 • edited Loading

coveralls commented Oct 21, 2017 • edited Loading

IanLee1521 commented Oct 21, 2017

coveralls commented Oct 21, 2017 • edited Loading

coveralls commented Oct 21, 2017 • edited Loading

IanLee1521 commented Oct 21, 2017

konklone left a comment

Choose a reason for hiding this comment

konklone Oct 22, 2017

Choose a reason for hiding this comment

konklone Oct 22, 2017

Choose a reason for hiding this comment

konklone Oct 22, 2017

Choose a reason for hiding this comment

IanLee1521 Oct 23, 2017

Choose a reason for hiding this comment

konklone Oct 24, 2017

Choose a reason for hiding this comment

konklone commented Oct 22, 2017

IanLee1521 commented Oct 23, 2017

coveralls commented Oct 23, 2017 • edited Loading

h-m-f-t left a comment

Choose a reason for hiding this comment

jsf9k left a comment

Choose a reason for hiding this comment

IanLee1521 commented Oct 23, 2017 • edited Loading

konklone commented Oct 24, 2017

jsf9k commented Oct 24, 2017

IanLee1521 commented Oct 24, 2017

IanLee1521 commented Oct 24, 2017

konklone commented Oct 24, 2017

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 21, 2017 •

edited

Loading

coveralls commented Oct 23, 2017 •

edited

Loading

IanLee1521 commented Oct 23, 2017 •

edited

Loading