Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple health checks per service #591

Merged
merged 7 commits into from
Jan 21, 2015
Merged

Support multiple health checks per service #591

merged 7 commits into from
Jan 21, 2015

Conversation

ryanuber
Copy link
Member

@ryanuber ryanuber commented Jan 9, 2015

Allows multiple health checks to be registered for a single service. Preserves backwards compatibility of configs by using a new checks field. Registering multiple health checks can be done in a few different ways:

Embedded in service definitions in the checks key. If there is only a single check, the check ID will be service:<service name>. If there are multiple checks, the service IDs will be service:<service name>:<n>.

{
  "service": {
    ...
    "checks": [
      {
        "script": "/usr/bin/true",
        "interval": "10s"
      },
      ...
    ]
  }
}

In check configuration files using the service_id key. This allows additive configuration of service checks in a consul.d-style directory:

{
  "checks": [
    {
      "name": "redis check",
      "script": "/usr/bin/true",
      "interval": "10s",
      "service_id": "redis"
    },
    ...
  ]
}

From the HTTP API by specifying service_id in the check definition. This allows adding additional health checks at runtime, and utilizes the new service/check persistence logic to restore the checks on later agent starts:

PUT /v1/agent/check/register
{
  "name": "redis check",
  "script": "/usr/bin/true",
  "interval": "10s",
  "service_id": "redis"
}

Fixes #230.

@ryanuber
Copy link
Member Author

ryanuber commented Jan 9, 2015

One thing that has also changed is that check registrations will be rejected if the service ID they are associated with does not exist. Shouldn't be a big deal and prevents a weird case where the agent tries to anti-entropy the check in to consul and the fsm rejects it.

@armon
Copy link
Member

armon commented Jan 10, 2015

Awesome! This LGTM! I think this also fixes the issue of registering a check after the service is already created too?

@@ -137,6 +137,7 @@ type RegisterRequest struct {
Address string
Service *NodeService
Check *HealthCheck
Checks HealthChecks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the FSM / State Store handle this, so I think you need to make changes to the RegisterRequest path too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are right. I added this in 97c5d53, is that what you meant?

@armon
Copy link
Member

armon commented Jan 12, 2015

LGTM! The only thing I wonder about is a newer client (0.5.0) with an older server (0.4.1) will have issues potentially. I think we document in the upgrade notes that servers must be upgraded first (typically the case with our updates anyways).

@ryanuber
Copy link
Member Author

@armon Hm, yeah I think that is the case. We could modify the code path to continue passing the single .Check field through to preserve compatibility, but there would still be this weird gap if a client restarted using the newer .Checks field instead. I think advising in the upgrade documentation is probably the most reasonable thing to do. Should we do this in the release notes when 0.5 is cut?

@ryanuber
Copy link
Member Author

I've made a few adjustments in here after some experimentation. What we have now should be compatible with older versions of Consul by using Check instead of Checks when there is only a single check for a service (which would be the case on older Consuls).

We also solve the issues @armon mentioned earlier by batching service sync operations with check syncs where appropriate. Basically, if a check is to be synced, and is associated with a service, we attach the service to the RegisterRequest if it is out of sync or doesn't exist. This requires scoping the RegisterRequest to a single service and its associated checks, which is handled in syncChecks().

One final change is that we batch all checks which are not associated with any service together to reduce the number of RPC requests required to perform anti-entropy.

for _, check := range checks {
l.checkStatus[check.CheckID] = syncStatus{inSync: true}
l.logger.Printf(
"[WARN] agent: Check '%s' registration blocked by ACLs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should warn about the service registration (once) not the checks. Since Checks cannot be blocked currently.

@ryanuber ryanuber force-pushed the f-multicheck branch 2 times, most recently from 1470c86 to 1c64970 Compare January 15, 2015 06:29
@ryanuber
Copy link
Member Author

@armon I ended up reverting syncChecks back to its original syncCheck, and instead modifying syncService so that it searches for any out-of-sync checks and piggybacks them along with it. Reduced complexity quite a lot. I think this is ready for a once-over.

@armon
Copy link
Member

armon commented Jan 20, 2015

LGTM! Killing it. 👍

ryanuber added a commit that referenced this pull request Jan 21, 2015
Support multiple health checks per service
@ryanuber ryanuber merged commit d89af51 into master Jan 21, 2015
@ryanuber ryanuber deleted the f-multicheck branch January 21, 2015 06:15
@ryanuber
Copy link
Member Author

I couldn't get Travis to pass with this PR due to some Makefile changes recently, even after brute-forcing the rebuild button. They do pass locally. I'm going to address this outside of this PR.

@kdlan
Copy link

kdlan commented Feb 25, 2015

This feature seems not documented on https://www.consul.io/docs/agent/http/agent.html#agent_check_register

But request is worked

PUT /v1/agent/check/register
{
  "Name": "redis check",
  "Script": "/usr/bin/true",
  "Interval": "10s",
  "ServiceID": "redis"
}

duckhan pushed a commit to duckhan/consul that referenced this pull request Oct 24, 2021
Also bring Dockerfile from releases repo here so the Dockerfiles are
together.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow multiple checks attached to a service
3 participants