Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow re-join after leave #110

Closed
armon opened this issue May 2, 2014 · 18 comments
Closed

Allow re-join after leave #110

armon opened this issue May 2, 2014 · 18 comments
Labels
type/enhancement Proposed improvement or new feature

Comments

@armon
Copy link
Member

armon commented May 2, 2014

We should have a flag to allow a node to re-join even though it left.

@bscott
Copy link

bscott commented May 2, 2014

+1

blalor added a commit to blalor/docker-centos-repobuilder that referenced this issue May 3, 2014
Preserve the cluster server members on stop and use them to join on
start.

Work-around for hashicorp/consul#110
@XavM
Copy link

XavM commented May 3, 2014

+1

May be better: allow a new configuration key to specify an existing DNS server (and an optional DNS port if you want to use consul as the DNS), fetch the SRV records corresponding to the consul service from this DNS and join this consul cluster

This key could be set to an array of ["primary_dns_ip:port", "secondary_dns_ip:port"]
If not set, consul agent would fallback to the DNS specified in resolv.conf (or whatever for Windows, etc ...)

By default an agent won't join any nodes when it starts up
Specifying "-join" or "start_join" with NO address would force an agent to join upon starting up using consul SRV from DNS
Specifying "-join" or "start_join" WITH an address would force an agent to join upon starting up using this particular address (or set of addresses), bypassing the DNS fetch

This would be backward compatible with consul v0.2.0- and would eliminate the need to maintain a hard-coded list of addresses to join on startup

Credits : confd "-srv-domain" -> https://github.com/kelseyhightower/confd/blob/master/docs/dns-srv-records.md

@blalor
Copy link
Contributor

blalor commented May 4, 2014

I like that very much. But I don't see how a default SRV domain would work for everyone, unless you expect that search is set in resolv.conf (or equivalent).

@armon
Copy link
Member Author

armon commented May 4, 2014

I don't think we will be relying on DNS for this. The initial join will still be required, but we can re-join the cluster on further reboots using the old cached member information. If you want DNS for bootstrapping, that can be composed more elegantly by just using the consul join command with a DNS host name.

@XavM
Copy link

XavM commented May 4, 2014

@blalor: Not sure i get your question right; What i meant :

Upon start up, the consul agent queries the DNS (the one you have specified with the new suggested configuration key, or the one set in resolv.conf) for the SRV record "consul.service.consul", then, the DNS answers the list of available servers (including ips and associated ports) where you can join an existing consul cluster

The DNS you query could be an existing consul cluster, or your main DNS server (bind, dnsmasq, etc ...) that you have previously setup to forward queries to Consul as appropriate (when zone is "consul")

My point was : Consul is designed for service discovery; Why not use consul to discover existing and healthy consul cluster members

PS : of course, the SRV lookup should use the "domain" configuration key when specified, and only fall back to the "consul" default domain when missing

@armon
Copy link
Member Author

armon commented May 4, 2014

@XavM I agree, but I think there is no special integration required. If the node is using Consul for DNS already, then "consul join consul.service.consul" should just work!

@XavM
Copy link

XavM commented May 4, 2014

@armon: Thank you for your answer

I think i must have missed your point (sorry about that)

Regarding the consul join command with a DNS host name, you have to know which host is up, is running consul (and on which port) and is member in the desired cluster, meaning you still have to maintain a hard-coded list of hosts:ports to join on startup

A rejoin using old cached member information could fail due to a stale cache (Depending on how long the node has been down, the topology could have change)

Any way, thank you for this promising solution; I am sure you will make the best choice


Edit :
@armon: After I have seen your previous response, I tried to join using "consul join consul.service.consul" and it works brilliantly !! (The doc does not mention it)

@blalor
Copy link
Contributor

blalor commented May 4, 2014

[Trying to bring threads in two mediums together]

On May 4, 2014, at 4:57 PM, XavM notifications@github.com wrote:

Regarding the consul join command with a DNS host name, you have to know which host is up, is running consul (and on which port) and is member in the desired cluster, meaning you still have to maintain a hard-coded list of hosts:ports to join on startup

This is it exactly. It’s a chicken/egg problem. I think a -srv-domain option would be ideal. “-srv-domain” with no domain could default to the host’s DNS domain name, or one could be provided. The actual record that would be queried would be _consul._tcp.$DOMAIN. Then consul would just join using all of the entries in that record. Consul itself can’t be queried because it hasn’t joined the cluster, yet! I’ve got a work-around for not having -srv-domain; I’m using Chef to run the following:

if ! curl -f -s 'http://localhost:8500/v1/status/leader' &> /dev/null ; then
    dig +search +noall +answer _consul._tcp SRV | awk '{print $NF}' | xargs --no-run-if-empty consul join
fi

/v1/status/leader returns 500 if there is no leader (and therefore the agent’s not part of the cluster). If that query fails, then dig is used to search for the SRV record, and all entries in that record are passed to “consul join"

That just leaves the problem of how to update that record in DNS, which is part of the discussion of agent bootstrapping. If you’ve got a cluster with 3 servers, you don’t want all three updating Route53 (for example) when a server is added or removed; that should only happen from the leader. So the remaining issue is that I need a clean way to identify if a particular node is the leader, and ideally there would be a blocking query that would return the current set of servers.

A rejoin using old cached member information could fail due to a stale cache (Depending on how long the node has been down, the topology could have change)

With a definitive way to determine the servers for a datacenter (_consul._tcp.$DOMAIN SRV record), re-joining just becomes a performance optimization. I think the real solution is having a well-defined way to join a cluster on initial boot.

@armon
Copy link
Member Author

armon commented May 4, 2014

@blalor This actually does not solve the chicken and egg issue. This solution assumes that the DNS servers exist at a well known address to begin with (otherwise the SRV record would fail with no DNS server). At some point, there must be a well known address, that may be Consul or it may be DNS.

My suggestion is to because this is unescapable, simply run Consul on the DNS server. You need a well known address for at least one of them, this way it is 2 birds with one stone. Doing this allows you do use DNS to join the cluster without making any changes to Consul.

If you have 3 well known DNS addresses, then the DNS lookup for "consul.service.consul" will work unless all the DNS nodes are down, which is unlikely. Hope that helps.

@blalor
Copy link
Contributor

blalor commented May 4, 2014

But I'm not running my own DNS server. I'm using Route53.

@armon
Copy link
Member Author

armon commented May 4, 2014

@blalor Can you have a cron job on the consul servers that writes their IP to Route53 every few minutes? This way you can just join a well known address that is relatively up to date. It is only the initial join that is an issue, since moving forward we will be adding the re-join support.

@blalor
Copy link
Contributor

blalor commented May 4, 2014

If I just run a cron job on each server to ensure that its own IP is in the SRV record, I'll have to manually update the SRV record when servers are decommissioned. I'll also have to ensure that server-a doesn't overwrite the changes just made by server-b, since there's no atomic add/update operation in the Route53 API for a single record.

Consul already knows which servers are in the cluster, but If I write a script to use the output of consul members -role=consul -status=alive, I'll want to only do it on the leader so that I'm only updating the record once. The problem with this is that there's no way to query Consul to determine if a given node is the leader. I can use /v1/status/leader, but that returns IP:port; I'd have to either match that IP against all addresses bound to all interfaces on the host, or use the Consul config file to determine the IPs that the process is binding to.

@armon
Copy link
Member Author

armon commented May 4, 2014

@blalor Why do you need the leader node specifically? You only need to do a join with any node, so it doesn't need to be the leader. It is totally fine if server-a overwrites the record of server-b, since a join to any of them will succeed.

Trying to ensure only a single write from the cluster leader only seems like an optimization that isn't necessary due to the nature of the gossip and the join. If you just have a cron on the servers, allowing an override, then even when you decommission, one of the remaining live nodes will update the record by overwriting it.

@blalor
Copy link
Contributor

blalor commented May 5, 2014

I understand it isn't strictly necessary, but it feels sloppy to make the same API call with the same data from 3-5 different hosts at once.

Having a list of all servers in the SRV record increases the chance of a successful join in the event of a network partition or temporary unavailability of one server.

@armon armon closed this as completed in de30905 May 21, 2014
@dennybaa
Copy link

Hi,

it seems like rejoin after leave doesn't work. Namely if I interrupt consul with INT it leaves cluster. Also it wipes out peers.json with null value. Even if -rejoin is used consul can't get back online since theres's no info about peers...
However docs say:
-rejoin When provided Consul will ignore a previous leave and attempt to rejoin the cluster when starting.

So it's supposed that consul should rejoin after leave or I misunderstand something?

@armon
Copy link
Member Author

armon commented Jun 18, 2014

@dennybaa That is embarrassing! Fixed in a05e1ae.

@dennybaa
Copy link

Awesome. Cheers!

@blalor
Copy link
Contributor

blalor commented Dec 16, 2014

Wow, I was just thinking about this problem

Brian, could this same technique be used as part of the startup of an
individual consul server?

I am currently using Hashicorp's consul-join.conf upstart script found
here:
https://github.com/hashicorp/consul/blob/master/terraform/aws/scripts/upstart-join.conf

I was thinking to replace the hardcoded host IP to join a cluster with a
query for the SRV record. Do you see any immediate problems w/ this?

I am using ansible to start up the cluster. As part of the start up, I will
check for the existence of the SRV record. Should it not exist, i will
write the value of the first provisioned consul server.

On Sunday, May 4, 2014 11:31:39 PM UTC+2, Brian Lalor wrote:

[Trying to bring threads in two mediums together]

On May 4, 2014, at 4:57 PM, XavM <notifi...@github.com javascript:>
wrote:

Regarding the consul join command with a DNS host name, you have to know
which host is up, is running consul (and on which port) and is member in
the desired cluster, meaning you still have to maintain a hard-coded list
of hosts:ports to join on startup

This is it exactly. It’s a chicken/egg problem. I think a -srv-domain
option would be ideal. “-srv-domain” with no domain could default to the
host’s DNS domain name, or one could be provided. The actual record that
would be queried would be _consul._tcp.$DOMAIN. Then consul would just
join using all of the entries in that record. Consul itself can’t be
queried because it hasn’t joined the cluster, yet! I’ve got a work-around
for not having -srv-domain; I’m using Chef to run the following:

if ! curl -f -s 'http://localhost:8500/v1/status/leader' &> /dev/null 

; then
dig +search +noall +answer _consul._tcp SRV | awk '{print $NF}' |
xargs --no-run-if-empty consul join
fi

/v1/status/leader returns 500 if there is no leader (and therefore the
agent’s not part of the cluster). If that query fails, then dig is used to
search for the SRV record, and all entries in that record are passed to
“consul join"

That just leaves the problem of how to update that record in DNS, which is
part of the discussion of agent bootstrapping. If you’ve got a cluster
with 3 servers, you don’t want all three updating Route53 (for example)
when a server is added or removed; that should only happen from the leader.
So the remaining issue is that I need a clean way to identify if a
particular node is the leader, and ideally there would be a blocking query
that would return the current set of servers.

A rejoin using old cached member information could fail due to a stale
cache (Depending on how long the node has been down, the topology could
have change)

With a definitive way to determine the servers for a datacenter
(_consul._tcp.$DOMAIN SRV record), re-joining just becomes a performance
optimization. I think the real solution is having a well-defined way to
join a cluster on initial boot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

5 participants