-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul join broken in containerized Servers run on same node #2877
Comments
@iamlittle If you're concerned about this happening, then you need to pass in |
@sean- Thanks! I was looking for something like that in the docs. Guess I missed it. |
We will add a note to the docs and maybe even the error message to help people find the Something like |
@slackpad That sounds good, but I believe an option in consul to force the generation of the node id from another source would be very useful. |
@mgiaccone |
@iamlittle Thanks, I just solved it with the same command |
@mgiaccone that's fair - depending on how many people bump into this we may need to add an option to generate a uuid internally - we've got the code in there, it's just a tradeoff on adding more config complexity. |
Is it me or does the
This doesn't work for me. [update] |
@slackpad These are long running LXD containers which can stop and start over time. If I were to pass the the For now, I have reverted to v0.7.5 Thanks and Regards, |
... answering my own question ... 😄 As expected the changing For testing, if I restart the nodes (lxc containers) within a short span of time, I do see the message: The node joins in successfully after the health checks, so for me, things are working fine with v0.8.0 for now. Regards, |
A "better" IMO way to set the node-id is with something like this: cat /proc/sys/kernel/random/uuid > "$CONSUL_DATA_DIR"/node-id and then start your consul agent/server as per usual (pre 0.8) practice. |
@mterron thanks! I will have to come up with a startup logic of "execute only once, if node-id file doesn't exist" in the init script and the systemctl equivalent, so that the node-id file get generated only once! It's straightforward for the 14.04 upstart script, will check up on how to easily achieve for the systemctl equivalent 😦 Thanks and Regards, |
Changing this to enhancement - I think we should add a configuration to disable the host-based ID, which will make a random one if needed inside of Consul itself, and then save that to the data dir for persistence. This will make life easier for people trying to do this in Docker. |
Thanks @slackpad |
What's the scenario where you want consul to use the boot_id as node id? Generating a random node id by default seems more intuitive but I'm sure I'm missing something here. I mean, instead of having the -disable-host-node-id flag, I'd just add a -enable-host-node-id for the people that specifically need that behaviour. |
@mterron Nomad uses the same host-based IDs so it's nice to have the two sync by default (you can see where a job is running and go to .node.consul via Consul DNS kind of thing). It makes for some cool magic integration for applications like that, and in Consul you really don't want to be running two agents in the same cluster on the same host (unless you are testing or experimenting) so we made it opt-out for now. |
I've never used Nomad so boot_id seemed like an arbitrary choice for a random identifier but it sort of makes sense from a Hashicorp ecosystem point of view. 2 lines on the documentation should be enough to explain the default behaviour so that users are not surprised. Something like: "By default Consul will use the machine boot_id (/proc/sys/kernel/random/boot_id) as the node-id. You can override this behaviour with the -disable-host-node-id flag or pass your own node-id using the -node-id flag." or something like that. Thanks for replying to a closed issue! |
Hi @mterron we ended up adding something like that to the docs - https://www.consul.io/docs/agent/options.html#_node_id:
|
consul version
for both Client and ServerServer:
Consul v0.8.0
consul info
for both Client and ServerServer:
Operating system and Environment details
Ubuntu 16.04.1 LTS
Kubernetes 1.5
Description of the Issue (and unexpected/desired result)
Trying to join containerized consul servers on the same machine will throw an error due to
/proc/sys/kernel/random/boot_id
being identical across all containers on a host.Reproduction steps
Running Consul 0.8.0 in a 3 pod replica set on a single node Kubernetes cluster (development machine).
Deployment definition
consul agent join X.X.X.X
throws the error:I believe this to be a result of #2700. In any case, 0.8.0 could cause some serious problems in Kubernetes clusters if the 2 Consul pods were to be scheduled on the same machine. This may not occur immediately.
The text was updated successfully, but these errors were encountered: