Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support consul service health checks #102

Closed
wants to merge 3 commits into from
Closed

support consul service health checks #102

wants to merge 3 commits into from

Conversation

johnrengelman
Copy link
Contributor

Loads Consul service data into /_consul/service namespace. (#100)

Format for keys is:

/_consul/service/<serviceName>[/<tagName>]/<nodeIndex>

Consul datacenters are not supported yet. tagName is optional. Both sets of keys are generated.
Consul data is loaded as a JSON marshalled String as the Value in the keystore.
Using data in template requires unmarshalling the data using the new json or jsonArray functions bounds to the template.

Additionally, added some convenience methods parent and sibling to the template for easier navigation around the KV store.

Note: this is my 1st attempt at Go, so feel free to criticize style/naming/etc. I tried to follow standard where it was apparent.

Supporting change at: kelseyhightower/memkv#1

@@ -0,0 +1 @@
confd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk how the maintainers feel about this, but it isn't really necessary if you use go install instead of go build. That will build the binary and then place it in $GOPATH/bin, which should be in your $PATH.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to a different commit.

@kelseyhightower
Copy link
Owner

Thanks for the PR, I'm be speaking at events related to OSCON, and I'll get a chance to review this later in the week.

@carlosdp
Copy link

👍 pretty damn good first attempt 😄

One comment though: You're going to want to run go fmt ./... in your repo directory so that the code is properly formatted. Other than that, it looks good to me from an outsider perspective! Very excited to have this support in confd.

@johnrengelman
Copy link
Contributor Author

thanks for the comments. I've pushed the format changes.

@mohitarora
Copy link

@kelseyhightower - Any update on this PR. It will be great if we have this functionality.

@jetadmin
Copy link

jetadmin commented Aug 4, 2014

@johnrengelman - This is a pretty important feature that we need. We don't want to do the work again. I guess author is not merging it anytime soon. Can you please publish the binary somewhere and share the link?

service_vars, _ := getServicesHealth(*c, services)
for k, v := range service_vars {
vars[k] = v
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there every be conflicts between service names and key names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially. I don't think there is a way to protect against that unless we move all the KV entries into it's own namespace internal to confd.

That's why I selected to prefix the service data with _consul.

@kelseyhightower
Copy link
Owner

@johnrengelman Ok, I did a review and everything looks good so far. Please do your best to address the concerns I've highlighted. Don't worry if you can't fix everything, I'm happy to help out as well.

For those that want to see this get merged please test this out. My plan is to merge this feature into 0.6.x today if the changes are made and I'll cut a release for testing.

@johnrengelman
Copy link
Contributor Author

Added function comments describing behavior and cleaned up a couple lines that were not doing anything.

@kelseyhightower
Copy link
Owner

@johnrengelman Ok, lets just move the out the json functions then I'll get this merged. Then I'll turn my attention to getting the json functions merged in, which will provide the complete solution.

@johnrengelman
Copy link
Contributor Author

Will do. Can you also take a look at kelseyhightower/memkv#1 and see what you think?
I needed this change because I ran into issues if there were no healthy servers in Consul then my templating would error because there were no keys. So instead, I changed it to return an empty list, which I think makes more sense.

@kelseyhightower
Copy link
Owner

I'm a bit torn about this feature. Mainly because we are trying to turn services into K/V pairs. I have a few questions that others can answer to help me understand.

Why not have this type of functionality live outside of confd?

It seems we are performing health checks on services, and if they pass using the service info as a value. I've seen something like this done for etcd as well. In that case the result of health checks updated K/V pairs in etcd. The nice thing about that approach it keeps confd simple and focused.

@kelseyhightower
Copy link
Owner

@johnrengelman Do you want to jump on IRC and discuss this a bit further?

@johnrengelman
Copy link
Contributor Author

Sure. Where?

@kelseyhightower
Copy link
Owner

@johnrengelman #confd freenode

@carlosdp
Copy link

carlosdp commented Aug 4, 2014

Consul Services are basically health-check aware / smart K/V pairs. Essentially, Consul is updating the K/V for us while etcd needs an outside component. Implementing something outside confd would just add extra complexity or end up re-writing confd, but just for Consul.

@mohitarora
Copy link

+1 on comment by @carlosdp

Even though services concept is only present in consul not etcd but confd should have it for sure.

@johnrengelman
Copy link
Contributor Author

After discussing with @kelseyhightower and @carlosdp the plan forward is to keep confd itself small and support 4 core backends:

  • etcd
  • consul K/V
  • env vars
  • custom (tcp/unix socket)

The custom backend will use either tcp or unix file sockets to connect confd to another process that can supply the backend implementation of your choosing. The Consul Service API backend will be developed here: https://github.com/johnrengelman/confd-consul-sd

While this does make the implementation slightly more complicated (i.e. it requires an intermediate process instead of talking directly to consul), it will allow confd to say thinner and more abstract to the various concepts from the various application a user may want to integrate with. Additionally, the custom backend scheme will allow new protocols or implementations with specific business logic to be develop and utilized without changes to confd core.

@kelseyhightower
Copy link
Owner

@johnrengelman I'm going to close this out and start thinking about that custom backend.

@johnrengelman
Copy link
Contributor Author

Sounds good.

@jhmartin
Copy link
Contributor

jhmartin commented Aug 6, 2014

Is there any provision here to look for services that both tags 'xyz' and 'qa'?

@carlosdp
Copy link

carlosdp commented Aug 6, 2014

@jhmartin no, but I think this should be taken one step at a time. Figure out multiple tags later.

@johnrengelman
Copy link
Contributor Author

I think datacenters are a slightly different beast than tags though. Reading the Consul API doc, when dealing with datacenters, you either query your local by default (by not specifying one) or you specify the datacenter to query information about...there isn't a mechanism for querying all datacenters at once.

Is there a use case for wanting confd to have access to all service information across all datacenters during a single template run?

I guess we can support that query in the template by doing something like:

{{range getvs "/_consul/services/*/httpd/*}}

@kelseyhightower does that work in the templates?

But perhaps we should structure it a bit differently:

{{range getvs "/_consul/dc/*/services/httpd/*"}}

Then if you wanted the results from all datacenters, we could drop the dc part of the query so we'd have the original format:

{{range getvs "/_consul/services/httpd/*"}}

Would that be more clear?

@carlosdp
Copy link

carlosdp commented Aug 6, 2014

@johnrengelman I don't think we want to have all DCs be the default, the node's DC should be the default. I also don't really know of a use case where we want something from all DCs in a template. It's scope-creep imo.

@johnrengelman
Copy link
Contributor Author

I agree. After typing all the up, I think it would be best to not support DCs at the moment and add support for them later by using a different key path (/_consul/dc/<dcName>/services/<serviceName>[/<tag>]/<index>), then a user could combine the data using nested ranges.

{{range getvs "/_consul/datacenters/*"}}
  {{range getvs fmt.Sprintf("/_consul/dc/%s/services/httpd/*", .Key)}}
  {{end}}
{{end}}

@armon
Copy link
Contributor

armon commented Aug 6, 2014

I agree. Most use cases are around the local DC.

@mohitarora
Copy link

Any update on this? What's the final key path?

@kelseyhightower
Copy link
Owner

@mohitarora We really need to sort out the key layout here, then we should be ready to go. I'm leaning on the consul users here to help work this out.

@johnrengelman
Copy link
Contributor Author

So this is from the Consul documentation:
When querying for services using the DNS interface, the following format is used:

<tag>.<service>.service.<datacenter>.<domain>

where <tag> and <datacenter> are optional, if <tag> is omitted, than all instances of the service are returned, if <datacenter> is omitted, then the local datacenter is queried.

For DNS results, only healthy nodes (nodes not failing the service healthcheck) are returned.

When querying for services using the REST API, the endpoint /v1/catalog/service/<service>
returns all instances of a service (without health information)

[
    {
        "Node": "foobar",
        "Address": "10.1.10.12",
        "ServiceID": "redis",
        "ServiceName": "redis",
        "ServiceTags": null,
        "ServicePort": 8000
    }
]

while the endpoint /v1/health/service/<service> returns the service instances with health information

[
    {
        "Node": {
            "Node": "foobar",
            "Address": "10.1.10.12"
        },
        "Service": {
            "ID": "redis",
            "Service": "redis",
            "Tags": null,
            "Port": 8000
        },
        "Checks": [
            {
                "Node": "foobar",
                "CheckID": "service:redis",
                "Name": "Service 'redis' check",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "redis",
                "ServiceName": "redis"
            },
            {
                "Node": "foobar",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "",
                "ServiceID": "",
                "ServiceName": ""
            }
        ]
    }
]

Both rest endpoints will filter for tags by appending ?tag= and by datacenter by appending ?dc= query parameters. If no ?tag= is provided, then no filtering occurs, if no ?dc= is provided, then the local datacenter is queried.

The /v1/health/service/<service> additionally takes a ?passing query parameter to return only healthy nodes.

Based on this, I think we should try to replicate the DNS interface behavior in returning only healthy services using the same DNS name but in reverse:

<domain>[/<datacenter>]/service/<service>[/<tag>]

Internally, we would use the consul client to query the health endpoint and only get healthy nodes (same as the DNS interface).

If the /<datacenter> portion is omitted, then the local datacenter is queried:

<domain>/service/<service>[/<tag>]

In Consul, it appears the is always consul, so we can just use that (or _consul to try to indicate that it is generated and not K/V data).

This basically gets us to exactly what this PR implements and is the logic I used when creating this PR.

So the basic endpoint would be:

/_consul/service/<service>[/<tag>]

and then we can add support later for datacenters using:

/_consul/<datacenter>/service/<service>[/tag]

If people feel we need to support getting all nodes instead of just healthy nodes, I think we should support that via a backend configuration option instead of through the key structure (default to only healthy nodes, but allow configuration to return all nodes). Then the user could do their own filtering because the health information would be in the JSON stored at the key.

@kelseyhightower
Copy link
Owner

@johnrengelman Now we need everyone to agree about support for datacenters and tags. This is where I'm going to draw the line, if we really plan on ever adding datacenter and tag support for services then we should just avoid adding support for services at all.

Based on the conversation it seems people want something more robust than confd, and something that supports generic data sources and not just K/V. Maybe confd's run has come to an end and it maybe better to fork confd and add all of these features. This was the number one reason for rejecting this PR in the first place. While the idea and code are solid, the cost of adding it opens the door to a whole new world of features confd was never designed for.

Is it time for a "better" tool? I think this PR speaks to how broken all of this service discover stuff is. We all want data in a different formats to handle creating load balancer configuration files. To be honest we are just making a mess.

For the record I really support people taking the ideas of confd and creating a bunch of tools that integrate deeply with other tools in a way confd doesn't.

@jhmartin
Copy link
Contributor

jhmartin commented Aug 8, 2014

I think the PR as it stands is good enough, which is all this needs to do. It doesn't need to solve every possible use-case, but having a tool that solves the 80% use case available will draw much more attention to the overall problem -- and hopefully spawn the alternate tools that can handle the 20% / 'advanced' cases. Let confd handle the basic service-discovery cases that the PR already handles, get that out, and see where it goes from there.

@kelseyhightower
Copy link
Owner

@jhmartin The problem is once I hit the merge button, it's going to be hard to take it back. This does not integrate well with prefixes and will cause confusion, which I don't have the bandwidth to fully support at the moment.

I know I will not be able to make everyone happy, but I can at-least be sure to only merge things that we can actually support.

@armon
Copy link
Contributor

armon commented Aug 8, 2014

I like the key structure proposed by @johnrengelman. I think even without datacenter support it solves the 80% case as mentioned.

@johnrengelman
Copy link
Contributor Author

@kelseyhightower I'll take a look today at better supporting the prefix key structure. I'm thinking I can just parse the prefixes and use them to determine which services to look up data on a such.

I'll update this PR with my changes.

@jhmartin
Copy link
Contributor

jhmartin commented Aug 8, 2014

@johnrengelman That'll improve performance significantly in large environment as well, very nice.

@carlosdp
Copy link

carlosdp commented Aug 8, 2014

@kelseyhightower I think this is ready to go as is. Datacenter support can be added later without issue.

@johnrengelman
Copy link
Contributor Author

Ok, I updated the PR to handle confd prefixes...Now it will only query for specific data in consul.
Examples:
Prefix = /_consul : Return all services and all tags using the above format
Prefix = /_consul/service : Same as /_consul currently
Prefix = /_consul/service/web : Get only data about the 'web' service, get data about all of the know s tags for 'web'
Prefix = /_consul/service/web/master : Get only data about the 'web' service with the 'master' tag.

Each prefix configure for confd is processed, so depending on the ordering there, you could resolve the same data a couple times...it will just overwrite itself in the internal keystore. This all happens before the the template is written out (same as the K/V).

@kelseyhightower
Copy link
Owner

@johnrengelman Ok, I've taken a moment to play with this feature, and while I understand why people want this feature it's does not fit the confd model.

Don't get me wrong, it totally works, it just takes confd out of scope. I'm going to update the confd docs to express more clearly the goals for the project. confd is not a good solution for service discovery, even though it will work in some limited cases where data used by service discovery tools is made available via K/V pairs, it will not work for the uses cases outlined in this PR.

The hard line going froward for confd will be only to fetch data from K/V stores, I'm also leaning towards limiting those K/V stores to consul, etcd, and env vars.

I know this decision will mean confd losing some users, but I think the project still adds value in the form of simplicity and focus.

To quickly recap, this PR is being rejected based on the following:

  • querying non k/v endpoints is out of scope of confd. That includes consul services, catalog, and health checks.

This was a touch decision and I did go back and forth on this one. But I feel it's the right choice for confd long term.

@kshep
Copy link

kshep commented Aug 9, 2014

@kelseyhightower I'm a little confused... consul isn't simply a k/v store, it's primarily a service registry. I think the core use case for consul+confd would be to register services with consul, then use confd to populate config files that point to those service. If I understand consul correctly, k/v storage is a secondary feature. If consul users need to use some other app to get service data into config files, I can't really see them also using confd.

@kelseyhightower
Copy link
Owner

@kshep confd was built around etcd, before consul existed. The idea was simple, provide a bridge for people to get started with etcd until more apps supported it natively. etcd does not support additional APIs for service discovery, it's all done via K/V pairs.

Enter consul.

The idea was simple, provide the same support for consul that we do for etcd. We did. Consul supports a lot more features than etcd, including DNS and a services API for service discovery. Consul can be considered more feature rich and may benefit from a tool that supports all these features. confd is not that tool.

@stevenjack
Copy link

Just a quick one if anyone stumbles across this PR, Consul has support for it's own templating:

https://github.com/hashicorp/consul-template#templating-language

tomdee pushed a commit to tomdee/confd that referenced this pull request Jun 12, 2018
…ed-libcalico-update

Automatic update of libcalico-go to 88291609bbefd0fe22982a7a78ec51d63583a651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants