Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Pyro with HTTP(S) communication layer #1923

Merged
merged 12 commits into from
Sep 20, 2016

Conversation

benfitzpatrick
Copy link
Contributor

@benfitzpatrick benfitzpatrick commented Jul 1, 2016

This closes #1872.

I thought it was time to put it up for discussion.

Pyro has been replaced with cherrypy (server) and built-in Python library urllib2 (optionally upgrading to requests if possible). cherrypy is bundled with cylc as Pyro was, and there are no new dependencies apart from optional dependence on OpenSSL (for HTTPS rather than HTTP) and the requests library for better certificate verification (N.B. We can do better certificate verification than is done in this PR with pure Python 2.7).

Communication is via HTTP(S) secured with HTTP Digest Auth and the usual passphrase system, using 'cylc' as the username for full access and 'anon' for the public access. I have separated out the authentication so that theoretically someone can add another auth system if they want to do a fair bit of work!

All test-battery tests pass (although some seem flakier than usual). I think flakiness is due to some shutdown/ports file timing issue and will investigate. (EDIT: Seems much better now)

Some compromises and other information:

  • SSL certificates and private keys are generated on registration, although they could (?) be generated on run and then assigned to the correct host. Otherwise they have to be wildcard certificates. We could also support some given sub-domains.
  • SSL certificate remote and alternative suite/host fetching is a bit ugly and could be further consolidated with passphrase logic.
  • SSL certificates are self-signed as we need one per suite. We can think about improving this for later browser interfaces (e.g. having a master certificate that has a child certificate for each suite and then importing that into the browser)
  • suite host self-identification only works at present if the suite is registered on the same host that it is run on (we can force this to be true if we think it's necessary, or force generate-certificate-on-run)
  • daemons and clients fall back to HTTP if they have to
  • Digest Auth uses MD5 as standard so we should add support for a choice of auth (maybe even basic auth, which is 'OK' for HTTPS) I've added support for SHA1 if we need to for FIPS support.
  • SSL is better supported at higher versions of Python than we use here - it is more secure at >2.7.9.
  • There are going to be some debug statements that I need to remove

@benfitzpatrick benfitzpatrick added this to the soon milestone Jul 1, 2016
@matthewrmshin
Copy link
Contributor

Initial comments:

  • Well done!
  • I wonder if it will make review easier you can do some rebase and squash. In particular, if you can move the changeset to bundle cherrypy (and remove pyro) to either end.
  • It appears that you have evolved an older version of cylc.registration. I have since taken out some old logic.
  • Some network service modules are basically an import statement. I wonder if they can be grouped together or not?
  • Tests definitely a bit flaky. Test battery without global configuration passes after re-running only failed tasks on the 3rd attempt.

On your bullet points 1 and 4: generate-on-run is probably desirable for us, but may not be popular in some circumstances at other sites.

@trwhitcomb
Copy link
Collaborator

Well done indeed - this is an excellent change. I do have one question about module redesign - why the decision to split the clients and servers into separate modules instead of having a single module containing both?

@matthewrmshin
Copy link
Contributor

@trwhitcomb Splitting the server and the client allows an individual client to only import modules that are relevant to the client, and not all the stuffs required to drive the server.

@benfitzpatrick
Copy link
Contributor Author

Just to add to what Matt has said - importing lots of modules can take a surprisingly long amount of time - although we don't (much) care about the elapsed time of cylc trigger, we do care about cylc task message.

@benfitzpatrick benfitzpatrick force-pushed the comms.https branch 5 times, most recently from 9cadab3 to 60a0be2 Compare July 22, 2016 09:21
@benfitzpatrick
Copy link
Contributor Author

Tests much more stable now. This was a port-grabbing problem. cherrypy did not like it when started with a particular (non-occupied) port which was then stolen just before it could start. Specifically, it would have died with os._exit (same as sys.exit) and taken the whole daemon down.

@benfitzpatrick
Copy link
Contributor Author

I wonder if it will make review easier you can do some rebase and squash. In particular, if you can move the changeset to bundle cherrypy (and remove pyro) to either end.

Done

@matthewrmshin
Copy link
Contributor

Looking better. I have tests/authentication/04-* failing with ChannelFailures, which I suppose is the port grabbing issue? Otherwise, tests certainly more stable than before.

@benfitzpatrick
Copy link
Contributor Author

7cbe2ea and b87c240 add SHA1 support for HTTP Digest Auth to cherrypy/cylc and avoids using MD5 anywhere - so it should be more FIPS-compliant. The requests and Python's in-built urllib2 library both allow use of SHA1 (and autodetect its use), so only the server side needed altering. I decided not to go for basic auth.

@hjoliver
Copy link
Member

hjoliver commented Aug 9, 2016

@cylc/core - can this be merged as soon as I've reviewed it, as far as MO is concerned? Or are there admin changes (e.g. to network port range) that suggest we should bump this to cylc-7 (but still very soon, I hope!). If so, I'd like to obsolete cylc-5 syntax too - we can discuss offline if not so straightforward at your end.

@benfitzpatrick
Copy link
Contributor Author

@hjoliver and @matthewrmshin, please take an initial look - probably best to ignore the import cherrypy/Pyro deletion commits.

@benfitzpatrick benfitzpatrick force-pushed the comms.https branch 2 times, most recently from 31b4018 to b279be5 Compare August 10, 2016 08:16
@matthewrmshin
Copy link
Contributor

It is worth checking if cherrypy does any excessive DNS lookups (like Pyro used to do) by using strace -e getsockname on something like cylc gscan.

@hjoliver
Copy link
Member

@benfitzpatrick - sorry for the long delay on reviewing this. All seems to work well on my metomi VM - very nice - but on my Centos box at work, I get "ERROR Not authorized: ... access type 'private'" for any attempt to connect.

@hjoliver
Copy link
Member

Scratch that, it's working now (on Centos). I suspect network issues earlier (I noticed many tests fail on the VM at home too, immediately after turning the laptop on, until I get internet connectivity...)

@hjoliver
Copy link
Member

The --comms-timeout client option doesn't work. Start a suite with --no-detach then suspend it with Ctrl-Z, then run cylc scan --comms-timeout=2 SUITE or cylc dump --comms-timeout=2 SUITE etc. - hangs indefinitely.

@benfitzpatrick benfitzpatrick force-pushed the comms.https branch 4 times, most recently from efe8562 to 7450541 Compare September 9, 2016 13:54
@benfitzpatrick
Copy link
Contributor Author

No conflicts (right now)

@benfitzpatrick
Copy link
Contributor Author

Now unconflicted...

@benfitzpatrick
Copy link
Contributor Author

Rebased and pushed again.

@benfitzpatrick
Copy link
Contributor Author

@matthewrmshin, @hjoliver, please review and merge

@matthewrmshin matthewrmshin modified the milestones: next release, soon Sep 16, 2016
@arjclark
Copy link
Contributor

https://www.timeanddate.com/countdown/generic?iso=20160923T17&p0=1358&msg=%231923+no+longer+Ben%27s+problem&font=sanserif

I'll remember this when you're looking for suite support in your new role...

Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now good. Test battery passing robustly in my environment.

@hjoliver
Copy link
Member

hjoliver commented Sep 20, 2016

Can't see any problems after another high-speed blast through the code, and tests passing. I think we can merge this in and let it mature on master. 🎉

Excellent work @benfitzpatrick, as ever ... and sayonara 😢 I'll buy you a 🍺 in Lisbon.

That countdown timer is evil, by the way.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@hjoliver hjoliver merged commit 1059eab into cylc:master Sep 20, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Sep 20, 2016
Some tests in Rose test battery have become less robust since
cylc/cylc-flow#1923 is merged in. This change improves the situation by
removing some of the old logic for detecting when a suite is running.

The change also removes `rose suite-hook` tests as the tests are some
what unstable and the command itself has now been replaced by better
cylc built-in functionality.
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Sep 20, 2016
Some tests in Rose test battery have become less robust since
cylc/cylc-flow#1923 is merged in. This change improves the situation by
removing some of the old logic for detecting when a suite is running.

The change also removes `rose suite-hook` tests as the tests are some
what unstable and the command itself has now been replaced by better
cylc built-in functionality.
@oliver-sanders oliver-sanders mentioned this pull request Jun 14, 2017
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Communications layer refactor, to RESTful HTTP
5 participants