Proposal: Periodic tasks #87

justenwalker · 2016-02-26T03:45:14Z

Currently, we have preStart, preStop and postStart events; As the container is running though, it may be useful to have periodic tasks execute to report on status to external systems separate from the health checks that report to the service discovery backend.

The primary use-case would be a logical extension point for push-style metrics without having to build in any backends into Containerbuddy directly. (See #27 for discussion)

Configuration may look something like:

{
  "onScheduled": [
    { "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
    { "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ] }
  ]
}

Some other things to consider:

Should we be supporting a more flexible duration syntax? For example: ISO8601 Durations. For the metrics use-case, the frequency of events will be very short duration. Backups could potentially be longer (hours). Certainly I don't think there would be multi-day pauses between execution.
Should we build in the idea of exponential back-off to deal with back-pressure? If a command results in a specific non-zero exit code, we reduce the frequency to give the target system time to recover. Note: This may be useless or harmful for larger frequencies (Hours), so we might need to also specify the retry frequency as well in order to fix that.

The text was updated successfully, but these errors were encountered:

tgross · 2016-02-26T14:00:08Z

Should we be supporting a more flexible duration syntax? For example: ISO8601 Durations. For the metrics use-case, the frequency of events will be very short duration. Backups could potentially be longer (hours). Certainly I don't think there would be multi-day pauses between execution.

The architecture we have for Containerbuddy hasn't been particularly optimized for high-performance and forking off lots of processes at subsecond intervals might prove to be costly. I would suggest that if we want to allow for subsecond that we do some performance testing of that design.

Should we build in the idea of exponential back-off to deal with back-pressure? If a command results in a specific non-zero exit code, we reduce the frequency to give the target system time to recover. Note: This may be useless or harmful for larger frequencies (Hours), so we might need to also specify the retry frequency as well in order to fix that.

This also brings up the question of semantics of the frequency. Under the current design of poll the polling goroutine is blocked by the execution of the pollingFunc we give it. This is by design because we don't want (for example) multiple health checks running simultaneously -- if a health check can't return within the TTL then the node should be marked unhealthy anyways.

For tasks that run long is this still the correct behavior?

justenwalker · 2016-03-10T04:33:06Z

Under the current design of poll the polling goroutine is blocked by the execution of the pollingFunc we give it. This is by design because we don't want (for example) multiple health checks running simultaneously -- if a health check can't return within the TTL then the node should be marked unhealthy anyways.

For tasks that run long is this still the correct behavior?

I think so. It usually doesn't make sense for periodic tasks to overlap. If they do it is probably a mistake. Either:

The new tasks will block until the old one is finished
The new task will kill the old one before running

Perhaps the semantics could be configurable

Consider the use case of the backup. If the first backup didn't finish, should we wait for it to complete? or kill it and start another? Probably the former?

However, In the case of pushing metrics, if the last one didn't complete - we should probably just kill it since it is stale now anyway.

We don't want to fork too many processes, so I don't think we should support scheduled tasks which overlap and continue to spawn more and more processes that don't exit. Perhaps both can be accomplished with a timeout on the scheduled task (which may default to the frequency?)

{
  "onScheduled": [
    { "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
    { "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ], "timeout": "5s" }
  ]
}

tgross · 2016-03-10T13:09:26Z

Perhaps both can be accomplished with a timeout on the scheduled task (which may default to the frequency?)

I like this idea. To support this we may need to update (or replace) executeAndWait to support a communications channel for stopping the process.

@justenwalker I'm planning on tackling #27 as my next major task for Containerbuddy. Do you want to take ownership of this project?

justenwalker · 2016-03-10T13:39:01Z

Sure 😋 🍪

tgross · 2016-03-16T14:51:44Z

When we do this, let's try and split out the functionality into its own library that main calls (per #83). This will reduce the scope of refactoring when we do #83 and give us some guidance on how to do it.

justenwalker · 2016-03-25T18:16:44Z

Just an update @tgross I started a WIP branch if you want to follow it. not ready for PR yet, but perhaps we can discuss the implementation i'm going with.

Also I didn't split out the module yet, but I'll work on that too.

tgross · 2016-03-25T18:25:56Z

Cool, I've got https://github.com/tgross/containerbuddy/tree/gh27_metrics in the works myself and I started with splitting the module out just as a "let's make sure that I can call it correctly."

I realize you're early in the process but you may find what you've done with ScheduledTaskConfig tricky in terms of avoiding a circular import with config.go. I suspect you'll want to move that into the containerbuddy (and, later, config) package, but overall this is looking like a good approach.

tgross · 2016-03-25T20:26:06Z

More thoughts on module split-up here: #83 (comment)

tgross · 2016-03-25T20:28:15Z

@justenwalker just because the timing is inconvenient on that first stage refactor, I'm going to try to push it out early Monday morning so that we can use it to base our new packages on. This way it's not getting delayed and then we end up both having to rework sections of metrics and tasks to suit. And, as noted in that #83 it'll give us a chance to make sure it's the right abstraction before refactoring the rest of the modules.

tgross · 2016-03-25T20:52:58Z

Actually that turned out to be a much smaller intervention than I'd thought so I've opened #118.

tgross · 2016-04-22T14:10:27Z

Once we get a green build on master back from TravisCI, I'll cut release 2.1.0 with this in it.

tgross · 2016-04-22T14:24:03Z

Released in https://github.com/joyent/containerpilot/releases/tag/2.1.0

justenwalker mentioned this issue Feb 26, 2016

Proposal: add user defined application telemetry #27

Closed

justenwalker changed the title ~~Support scheduled tasks~~ Proposal: Periodic tasks Feb 26, 2016

tgross added the proposal label Feb 26, 2016

tgross mentioned this issue Mar 1, 2016

Update example ctmpl to select services autopilotpattern/nginx#4

Closed

tgross added the enhancement label Mar 10, 2016

tgross removed the proposal label Mar 25, 2016

justenwalker mentioned this issue Apr 15, 2016

Implement Periodic Tasks #134

Merged

tgross added open PR awaiting release and removed open PR labels Apr 20, 2016

tgross closed this as completed Apr 22, 2016

tgross added released and removed awaiting release labels Apr 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Periodic tasks #87

Proposal: Periodic tasks #87

justenwalker commented Feb 26, 2016

tgross commented Feb 26, 2016

justenwalker commented Mar 10, 2016

tgross commented Mar 10, 2016

justenwalker commented Mar 10, 2016

tgross commented Mar 16, 2016

justenwalker commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Apr 22, 2016

tgross commented Apr 22, 2016

Proposal: Periodic tasks #87

Proposal: Periodic tasks #87

Comments

justenwalker commented Feb 26, 2016

tgross commented Feb 26, 2016

justenwalker commented Mar 10, 2016

tgross commented Mar 10, 2016

justenwalker commented Mar 10, 2016

tgross commented Mar 16, 2016

justenwalker commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Mar 25, 2016

tgross commented Apr 22, 2016

tgross commented Apr 22, 2016