Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Periodic tasks #87

Closed
justenwalker opened this issue Feb 26, 2016 · 12 comments
Closed

Proposal: Periodic tasks #87

justenwalker opened this issue Feb 26, 2016 · 12 comments

Comments

@justenwalker
Copy link
Contributor

Currently, we have preStart, preStop and postStart events; As the container is running though, it may be useful to have periodic tasks execute to report on status to external systems separate from the health checks that report to the service discovery backend.

The primary use-case would be a logical extension point for push-style metrics without having to build in any backends into Containerbuddy directly. (See #27 for discussion)

Configuration may look something like:

{
  "onScheduled": [
    { "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
    { "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ] }
  ]
}

Some other things to consider:

  • Should we be supporting a more flexible duration syntax? For example: ISO8601 Durations. For the metrics use-case, the frequency of events will be very short duration. Backups could potentially be longer (hours). Certainly I don't think there would be multi-day pauses between execution.
  • Should we build in the idea of exponential back-off to deal with back-pressure? If a command results in a specific non-zero exit code, we reduce the frequency to give the target system time to recover. Note: This may be useless or harmful for larger frequencies (Hours), so we might need to also specify the retry frequency as well in order to fix that.
@justenwalker justenwalker changed the title Support scheduled tasks Proposal: Periodic tasks Feb 26, 2016
@tgross
Copy link
Contributor

tgross commented Feb 26, 2016

Should we be supporting a more flexible duration syntax? For example: ISO8601 Durations. For the metrics use-case, the frequency of events will be very short duration. Backups could potentially be longer (hours). Certainly I don't think there would be multi-day pauses between execution.

The architecture we have for Containerbuddy hasn't been particularly optimized for high-performance and forking off lots of processes at subsecond intervals might prove to be costly. I would suggest that if we want to allow for subsecond that we do some performance testing of that design.

Should we build in the idea of exponential back-off to deal with back-pressure? If a command results in a specific non-zero exit code, we reduce the frequency to give the target system time to recover. Note: This may be useless or harmful for larger frequencies (Hours), so we might need to also specify the retry frequency as well in order to fix that.

This also brings up the question of semantics of the frequency. Under the current design of poll the polling goroutine is blocked by the execution of the pollingFunc we give it. This is by design because we don't want (for example) multiple health checks running simultaneously -- if a health check can't return within the TTL then the node should be marked unhealthy anyways.

For tasks that run long is this still the correct behavior?

@justenwalker
Copy link
Contributor Author

Under the current design of poll the polling goroutine is blocked by the execution of the pollingFunc we give it. This is by design because we don't want (for example) multiple health checks running simultaneously -- if a health check can't return within the TTL then the node should be marked unhealthy anyways.

For tasks that run long is this still the correct behavior?

I think so. It usually doesn't make sense for periodic tasks to overlap. If they do it is probably a mistake. Either:

  1. The new tasks will block until the old one is finished
  2. The new task will kill the old one before running

Perhaps the semantics could be configurable

Consider the use case of the backup. If the first backup didn't finish, should we wait for it to complete? or kill it and start another? Probably the former?

However, In the case of pushing metrics, if the last one didn't complete - we should probably just kill it since it is stale now anyway.

We don't want to fork too many processes, so I don't think we should support scheduled tasks which overlap and continue to spawn more and more processes that don't exit. Perhaps both can be accomplished with a timeout on the scheduled task (which may default to the frequency?)

{
  "onScheduled": [
    { "frequency": "1s", "command": [ "/bin/push_metrics.sh" ] },
    { "frequency": "10s", "command": [ "/bin/push_other_metrics.sh" ], "timeout": "5s" }
  ]
}

@tgross
Copy link
Contributor

tgross commented Mar 10, 2016

Perhaps both can be accomplished with a timeout on the scheduled task (which may default to the frequency?)

I like this idea. To support this we may need to update (or replace) executeAndWait to support a communications channel for stopping the process.

@justenwalker I'm planning on tackling #27 as my next major task for Containerbuddy. Do you want to take ownership of this project?

@justenwalker
Copy link
Contributor Author

Sure 😋 🍪

@tgross
Copy link
Contributor

tgross commented Mar 16, 2016

When we do this, let's try and split out the functionality into its own library that main calls (per #83). This will reduce the scope of refactoring when we do #83 and give us some guidance on how to do it.

@justenwalker
Copy link
Contributor Author

Just an update @tgross I started a WIP branch if you want to follow it. not ready for PR yet, but perhaps we can discuss the implementation i'm going with.

Also I didn't split out the module yet, but I'll work on that too.

@tgross
Copy link
Contributor

tgross commented Mar 25, 2016

Cool, I've got https://github.com/tgross/containerbuddy/tree/gh27_metrics in the works myself and I started with splitting the module out just as a "let's make sure that I can call it correctly."

I realize you're early in the process but you may find what you've done with ScheduledTaskConfig tricky in terms of avoiding a circular import with config.go. I suspect you'll want to move that into the containerbuddy (and, later, config) package, but overall this is looking like a good approach.

@tgross tgross removed the proposal label Mar 25, 2016
@tgross
Copy link
Contributor

tgross commented Mar 25, 2016

More thoughts on module split-up here: #83 (comment)

@tgross
Copy link
Contributor

tgross commented Mar 25, 2016

@justenwalker just because the timing is inconvenient on that first stage refactor, I'm going to try to push it out early Monday morning so that we can use it to base our new packages on. This way it's not getting delayed and then we end up both having to rework sections of metrics and tasks to suit. And, as noted in that #83 it'll give us a chance to make sure it's the right abstraction before refactoring the rest of the modules.

@tgross
Copy link
Contributor

tgross commented Mar 25, 2016

Actually that turned out to be a much smaller intervention than I'd thought so I've opened #118.

@tgross
Copy link
Contributor

tgross commented Apr 22, 2016

Once we get a green build on master back from TravisCI, I'll cut release 2.1.0 with this in it.

@tgross
Copy link
Contributor

tgross commented Apr 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants