RFD86 updates: use json5 and enhance dependency resolution #31

tgross · 2017-04-03T12:50:22Z

These changes follow discussion at last week's meetings around RFD36 and friends, as well as the previous week's work on multi-process support and some of the discoveries along the way of implementing that.

In this PR:

following a lot of negative reaction to YAML, I've changed the config language update to use JSON5
removed explicit "non-advertise" clause because we get that for free just by omitting the info required to register with Consul (i.e. the port)
removed the clunky "depends" arrays in lieu of a nicer upstart-like syntax which will be more flexible for us in future development
pulled watches out into their own config section because the external services we're watching and the events that services are waiting for are actually separate concerns -- we can respond to either internal services or external ones.

The big conceptual change is the realization that by merging all the various service/prestart/task/coprocess together, that we have non-advertised services for which we want to respond to events inside the same container. We might want to be able to health check the Consul agent without advertising it, for example.

One item I'd like to discuss again is whether "services" is the right word for the config now. It includes periodic tasks, non-advertised processes, and prestart/poststop/etc. The term is also overloaded with "services" as Consul sees them and "services" as an RFD36 scheduler might see them. (resolved in 4bea5a4)

cc @jasonpincin @misterbisson @geek

tgross · 2017-04-03T19:58:52Z

rfd/0086/multiprocess.md

+  watches: [
+    {
+      name: "database",
+      exec: "reconfigure-db-connection.sh",


In a sidebar discussion with @jasonpincin and @geek, we've decided that it would make more sense to move the exec into its own job. This way a watch merely watches and fires events, and then jobs react to those events.

Also, watches will produce events in the form watch.SomeWatchName, and will have an explicit field for their service (or, in the future, key) they want to watch.

tgross · 2017-04-03T19:59:38Z

rfd/0086/multiprocess.md

+
+```json5
+{
+  services: [


In a discussion with @jasonpincin and @geek, we decided that jobs would be a better terminology here as we're including non-advertised and one-off tasks.

tgross · 2017-04-03T20:00:38Z

rfd/0086/multiprocess.md

@@ -117,54 +131,101 @@ ContainerPilot hasn't eliminated the complexity of dependency management -- that

 That being said, a more expressive configuration of event handlers may more gracefully handle all the above situations and reduce the end-user confusion. Rather than surfacing just changes to dependency membership lists, we'll expose changes to the overall state as ContainerPilot sees it.

-ContainerPilot will provide the following events:
+ContainerPilot will provide events and each service can opt-in to having a `start` condition on one of these events. Because the life-cycle of each service triggers new events, the user can create a dependency chain among all the services in a container (and their external dependencies). This effectively replaces the `preStart`, `preStop`, and `postStop` behaviors.


In a sidebar discussion we've discussed that when is a more ergonomic than start for jobs that run more than once on on multiple triggered events. This gives us syntax like:

when: "startup"

when: "myPreStart exitSuccess timeout 60s"

when: "myDb healthy"

when: "myDb stopped"

misterbisson · 2017-04-04T17:11:35Z

rfd/0086/config.md

+  health: [
+    {
+      name: "checkA",
+      service: "nginx"


Is this supposed to match the job name above? Should the field name be job?

It does have to match the job, so yes that field name should probably be job too.

misterbisson · 2017-04-04T17:26:04Z

rfd/0086/config.md

+      // this is upstart-like syntax indicating we want to start this
+      // service when the "setup" service has exited with success but
+      // give up after 60 sec
+      when: "setup exitSuccess timeout 60s",


Where should I look to see the syntax for the events described? Specifically, I'm looking for more background on the choice of this syntax:

when: "setup exitSuccess timeout 60s"

Over something like:

when: { job: "setup", event: "exitSuccess", timeout: "60s", }

I'm not sure I have an opinion, I was just trying to better understand the the reasoning to use different syntax for defining those components, rather than the JSON5 we're using elsewhere.

There's no background other than existing systems like upstart which handle this nicely. Having "stringly-typed" events is an unfortunate problem in either case but one we're stuck with either way.

But actually on further reflection the proposed more clever syntax will just get screwed up by end users, which is problematic given that most of these changes are designed to fix poorly-considered config. So let's just make it as straightforward as humanly possible and just use the field names.

Although rather than "job" we should use "source" here as it's likely we'll have non-job sources of events (like watches).

Further question moved to https://github.com/joyent/rfd/pull/31/files#r109735602

misterbisson · 2017-04-04T17:28:40Z

rfd/0086/config.md

+      timeout: "5s",
+    }
+  }
+  watches: {


Perhaps a comment here explaining that watches replace upstreams from v2, and they trigger events that can be hooked with jobs defined above.

There's a lot of depth in TritonDataCenter/containerpilot#227, but summarizing it here (which will probably be the basis of our docs) might help clarify the conclusions.

It replaces backends but yeah, agreed. There's some commentary in the multiprocess.md file already but it couldn't hurt to repeat it here.

which will probably be the basis of our docs

This doc is largely about why we're changing things, which I'm not sure belongs in the shipped docs.

misterbisson · 2017-04-04T17:33:24Z

rfd/0086/multiprocess.md

+- A health check will poll every `poll` seconds, with a timeout of `timeout`. If any health check fails (returns a non-zero exit code or times out), the associated job is marked `unhealthy` and a `Fail` message is sent to the discovery job.
+- Once any health check fails, _all_ health checks need to pass before the job will be marked healthy again. This is required to avoid service flapping.
+
+**Important note:** end users should not provide a health check with a long polling time to perform some supporting task like backends. This will cause "slow startup" as their job will not be marked healthy until the first polling window expires. Instead they should create another job for this task.


To clarify my understanding: health checks should be short, and end users should never mix concerns between health checks and cron events. This is a good principle, but it's a also a practical limitation because a job will not be marked as healthy if the job runs long doing other work unrelated to the healthiness of the app. Yes?

misterbisson · 2017-04-04T18:00:16Z

rfd/0086/config.md

+  jobs: [
+    {
+      name: "app",
+      // this is upstart-like syntax indicating we want to start this


Still want to call this "upstart-like syntax"?

Missed that! Fixed!

misterbisson · 2017-04-04T18:02:10Z

rfd/0086/config.md

+      // service when the "setup" service has exited with success but
+      // give up after 60 sec
+      when: {
+          source: "setup",


So, a source can match a job or a watch name?

misterbisson · 2017-04-04T18:02:35Z

rfd/0086/config.md

+      // give up after 60 sec
+      when: {
+          source: "setup",
+          event: "exitSuccess",


I haven't read the whole thing yet, but I'll ask: is the dictionary of events enumerated here?

Yes, in multiprocess.md

tgross added 2 commits April 3, 2017 05:42

rfd86: use json5 and enhance dependency resolution

a13a0fa

rfd86 in draft

53aee51

tgross commented Apr 3, 2017

View reviewed changes

rfd86: configuration updates for 'when' and 'jobs'

4bea5a4

misterbisson reviewed Apr 4, 2017

View reviewed changes

rfd86: config syntax bikeshedding fixes

3351f7f

misterbisson reviewed Apr 4, 2017

View reviewed changes

rfd86: remove stray comment

905c03f

misterbisson reviewed Apr 4, 2017

View reviewed changes

tgross added 2 commits April 4, 2017 14:13

rfd86 more clarifications

2b34565

rfd86 more clarifications

8fbc539

tgross mentioned this pull request Apr 5, 2017

Jobs not services TritonDataCenter/containerpilot#307

Merged

tgross merged commit 9e14b60 into master Apr 5, 2017

tgross mentioned this pull request Apr 5, 2017

transparently swap out JSON config parsing for JSON5 TritonDataCenter/containerpilot#308

Merged

tgross deleted the rfd86_revisions branch April 10, 2017 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD86 updates: use json5 and enhance dependency resolution #31

RFD86 updates: use json5 and enhance dependency resolution #31

tgross commented Apr 3, 2017 •

edited

Loading

tgross Apr 3, 2017

tgross Apr 3, 2017

tgross Apr 3, 2017

tgross Apr 3, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

tgross Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

misterbisson Apr 4, 2017

tgross Apr 4, 2017

RFD86 updates: use json5 and enhance dependency resolution #31

RFD86 updates: use json5 and enhance dependency resolution #31

Conversation

tgross commented Apr 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross commented Apr 3, 2017 •

edited

Loading