Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advertise Ports #550

Closed
hanshasselberg opened this issue Dec 19, 2014 · 20 comments
Closed

Advertise Ports #550

hanshasselberg opened this issue Dec 19, 2014 · 20 comments

Comments

@hanshasselberg
Copy link
Member

I want to suggest another configuration option and I am curious what you think. Similiar to AdvertiseAddr I want to introduce AdvertisePorts. It would be very similiar to the existing Ports configuration option. AdvertisePorts would allow to specify the ports a consul client is reachable on.

Use case: I want to run consul client in a container. I have multiple containers per host and multipe hosts. There is a seperate host running consul server. My problem is that I can use AdvertiseAddr to communicate the container host ip, but I cannot configure different ports. Without it I can only run one consul per host because the ports are claimed then.

@hanshasselberg
Copy link
Member Author

Apparently AdvertiseAddr accepts a port already: https://github.com/hashicorp/serf/blob/master/command/agent/command.go#L241.

@hanshasselberg
Copy link
Member Author

When setting AdvertiseAddr to something like ip:port where port is forwarded on the container host to the container port 8301, this happens:

$ root@5c335b6c39f9:~# start-consul
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
    Join completed. Synced with 1 initial agents
==> Consul agent running!
         Node name: 'ip-10-77-16-60-1'
        Datacenter: 'dc1'
            Server: false (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.17.0.37 (LAN: 8301, WAN: 8302)
    Gossip encrypt: true, RPC-TLS: true, TLS-Incoming: true

==> Log data will now stream in as it occurs:

    2014/12/19 11:56:39 [WARN] memberlist: Refuting a suspect message (from: ip-10-77-16-60-1)
    2014/12/19 11:59:43 [WARN] memberlist: Refuting a suspect message (from: ip-10-77-16-60-1)

and

$  consul monitor --log-level=trace | grep ip-10-77-16-60-1
2014/12/19 11:54:39 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2014/12/19 11:54:39 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2014/12/19 11:54:39 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive
2014/12/19 11:54:39 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2014/12/19 11:54:40 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2014/12/19 11:55:07 [INFO] serf: EventMemberFailed: ip-10-77-16-60-1 172.17.0.37
2014/12/19 11:55:07 [INFO] consul: member 'ip-10-77-16-60-1' failed, marking health critical

Interestingly enough it can join the cluster.

@hanshasselberg hanshasselberg changed the title [RFC] Advertise Ports Advertise Ports Dec 22, 2014
@rotatingJazz
Copy link

+1, this is a very critical feature for running multiple clients on the same host (with Docker). Also when 8301 can not be opened in the host's firewall.

What happens now is that the client continually flaps.

"sssssooooo cclooooosseeeeee, aaaaaarg" 😄

@amiorin
Copy link

amiorin commented Dec 31, 2014

Same problem here with marathon, docker and consul.
d2iq-archive/marathon#929
👍

@hanshasselberg
Copy link
Member Author

@armon @ryanuber do you have any thoughts about that?

@ryanuber
Copy link
Member

ryanuber commented Jan 5, 2015

It makes sense that the node is able to join, since the join operation is being performed from the new node and is an outbound UDP message. Sounds like the problem starts once the new node's address is gossiped, and other nodes in the cluster begin pinging it for health status.

What advertise option are you using in the config file? Consul in its current code base should actually error if you specify ip:port. The option should be advertise_addr, and not advertise or AdvertiseAddr. I believe that Serf allows ip:port so this might just be oversight on our part. I'll take a look at this today.

@armon
Copy link
Member

armon commented Jan 5, 2015

Sorry about the delay, just getting back from holidays. I agree this makes sense, but I think the only port that you can really advertise is the RPC port. The other ports are not gossiped around and generally do not need to be accessible over the network. Since the RPC port is included in the gossip messages this shouldn't be too hard. Not sure why advertise address is causing the node to suspect itself. Must be a bug.

@hanshasselberg
Copy link
Member Author

@ryanuber I have a config file for the container stuff:

root@5c335b6c39f9:~# cat /opt/consul.d/container.json
{
  "advertise": "10.77.16.60:8304",
  "node_name": "ip-10-77-16-60-1"
}

Consul is properly starting with it.

@armon I understand. I will look into how to advertise the RPC port. Maybe you have some pointers for me what needs to be considered.

Is there anything I can provide to help identify the bug regarding the self-doubts of the node?

@rotatingJazz
Copy link

@armon Thank you so much for looking into this! 🍺 Happy new Year! 🍻

@armon
Copy link
Member

armon commented Jan 5, 2015

@i0rek Take a look at consul/server.go setupSerf(). Basically the "port" value we gossip out is the RPC port, which currently is setup to the real port. I guess there could be a new AdvertisePorts configuration that would allow a different port value to be gossiped out which would be set there.

@rotatingJazz
Copy link

An AdvertisePorts configuration option sounds like the cleanest solution. 👍

@armon
Copy link
Member

armon commented Jan 5, 2015

@i0rek with respect to the self-suspect, a dump of "consul members" and "consul members -detailed" would be useful. Looking through the memberlist code, not sure how this is possible.

@hanshasselberg
Copy link
Member Author

@armon this is what I see when I start consul in the container:

hans@ip-10-74-2-162:~$ consul members| grep ip-10-77-16-60-1
ip-10-77-16-60-1       172.17.0.37:8301     alive   client  0.4.1  2
hans@ip-10-74-2-162:~$ consul members -detailed| grep ip-10-77-16-60-1
ip-10-77-16-60-1       172.17.0.37:8301     alive   vsn=2,vsn_min=1,vsn_max=2,build=0.4.1:3b3f0822,role=node,dc=dc1

and after a minute or so:

hans@ip-10-74-2-162:~$ consul members| grep ip-10-77-16-60-1
ip-10-77-16-60-1       172.17.0.37:8301     failed  client  0.4.1  2
hans@ip-10-74-2-162:~$ consul members -detailed| grep ip-10-77-16-60-1
ip-10-77-16-60-1       172.17.0.37:8301     failed  dc=dc1,vsn=2,vsn_min=1,vsn_max=2,build=0.4.1:3b3f0822,role=node

@hanshasselberg
Copy link
Member Author

this is what happens on the cluster leader:

2015/01/05 22:48:25 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:48:25 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:26 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:27 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:27 [DEBUG] serf: messageJoinType: ip-10-77-16-60-1
2015/01/05 22:48:49 [INFO] serf: EventMemberFailed: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:48:49 [INFO] consul: member 'ip-10-77-16-60-1' failed, marking health critical
2015/01/05 22:48:58 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:48:58 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive
2015/01/05 22:49:30 [INFO] memberlist: Suspect ip-10-77-16-60-1 has failed, no acks received
2015/01/05 22:49:45 [INFO] memberlist: Marking ip-10-77-16-60-1 as failed, suspect timeout reached
2015/01/05 22:49:45 [INFO] serf: EventMemberFailed: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:49:45 [INFO] consul: member 'ip-10-77-16-60-1' failed, marking health critical
2015/01/05 22:50:34 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:50:34 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive
2015/01/05 22:50:54 [INFO] memberlist: Marking ip-10-77-16-60-1 as failed, suspect timeout reached
2015/01/05 22:50:54 [INFO] serf: EventMemberFailed: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:50:54 [INFO] consul: member 'ip-10-77-16-60-1' failed, marking health critical
2015/01/05 22:51:54 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:51:54 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive
2015/01/05 22:52:11 [INFO] serf: EventMemberFailed: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:52:11 [INFO] consul: member 'ip-10-77-16-60-1' failed, marking health critical
2015/01/05 22:52:12 [INFO] serf: EventMemberJoin: ip-10-77-16-60-1 172.17.0.37
2015/01/05 22:52:12 [INFO] consul: member 'ip-10-77-16-60-1' joined, marking health alive

EDIT1: I updated the logs to make sure it is a single start.
EDIT2: more logs.

@hanshasselberg
Copy link
Member Author

@ryanuber you were right and I was wrong. I got the configuration option wrong. Consul doesn't start anymore now that I am using the right one.

root@5c335b6c39f9:~# start-consul
==> Starting Consul agent...
==> Error starting agent: Failed to parse advertise address: 10.77.16.60:8304
root@5c335b6c39f9:~# cat /opt/consul.d/container.json
{
  "advertise_addr": "10.77.16.60:8304",
  "node_name": "ip-10-77-16-60-1"
}

@hanshasselberg
Copy link
Member Author

I would vote for accepting the port in advertise_addr like serf does.

@ryanuber
Copy link
Member

ryanuber commented Jan 6, 2015

@i0rek I pushed #576 yesterday which should help us avoid this confusion in the future. As for advertise_addr, I don't think we can just accept the port number on the end of the address in ip:port format, since we use it to set up two separate gossip agents and the RPC advertise address. I think as @armon said the way to go would be to have some new config option, maybe like:

"advertise_addrs": {
    "serf_lan": "1.2.3.4:8301",
    "serf_wan": "1.2.3.4:8302",
    "rpc": "1.2.3.4:8400"
}

@ryanuber
Copy link
Member

ryanuber commented Jan 6, 2015

Alternatively, you might want to try tweaking the ports config, specifically serf_lan and/or serf_wan. That combined with the advertise_addr might get you past this.

@rotatingJazz
Copy link

@ryanuber I've tried that, it only works when all the nodes have the same ports, which defeats the purpose. We need to be able to advertise the port along with the address.

@hanshasselberg
Copy link
Member Author

@ryanuber thanks for #576!
Re ports: thats not enough as @rotatingJazz said.
Re advertise_addrs: I will give it a shot and let you know once I have something to look at.

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
* Fix bug where upstream env vars not set

Fixes issue where we weren't setting the upstream environment variables
when the upstream annotation was set:
* <NAME>_CONNECT_SERVICE_HOST
* <NAME>_CONNECT_SERVICE_PORT

Since we're modifying a slice's values during iteration we must access
the element via reference using the index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants