Supporting setting `LocalAddr` in peer communication #17068

ahrtr · 2023-12-05T11:11:44Z

What would you like to be added?

Currently in peer communication, the local address is automatically chosen by the golang standard lib,

Lines 33 to 43 in d8f0771

    
           t := &http.Transport{ 
        
           	Proxy: http.ProxyFromEnvironment, 
        
           	DialContext: (&net.Dialer{ 
        
           		Timeout: dialtimeoutd, 
        
           		// value taken from http.DefaultTransport 
        
           		KeepAlive: 30 * time.Second, 
        
           	}).DialContext, 
        
           	// value taken from http.DefaultTransport 
        
           	TLSHandshakeTimeout: 10 * time.Second, 
        
           	TLSClientConfig:     cfg, 
        
           }

We should support setting a LocalAddr, which defaults to the advertise Peer address if enabled. It should be disabled by default to keep the existing behavior unless users intentionally enable it.

Why is this needed?

When accepting a connection from a peer, each etcd member checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. Please read https://etcd.io/docs/v3.5/op-guide/security/#notes-for-tls-authentication.

See also

etcd/client/pkg/transport/listener_tls.go

Lines 192 to 221 in d8f0771

    
           func checkCertSAN(ctx context.Context, cert *x509.Certificate, remoteAddr string) error { 
        
           	if len(cert.IPAddresses) == 0 && len(cert.DNSNames) == 0 { 
        
           		return nil 
        
           	} 
        
           	h, _, herr := net.SplitHostPort(remoteAddr) 
        
           	if herr != nil { 
        
           		return herr 
        
           	} 
        
           	if len(cert.IPAddresses) > 0 { 
        
           		cerr := cert.VerifyHostname(h) 
        
           		if cerr == nil { 
        
           			return nil 
        
           		} 
        
           		if len(cert.DNSNames) == 0 { 
        
           			return cerr 
        
           		} 
        
           	} 
        
           	if len(cert.DNSNames) > 0 { 
        
           		ok, err := isHostInDNS(ctx, h, cert.DNSNames) 
        
           		if ok { 
        
           			return nil 
        
           		} 
        
           		errStr := "" 
        
           		if err != nil { 
        
           			errStr = " (" + err.Error() + ")" 
        
           		} 
        
           		return fmt.Errorf("tls: %q does not match any of DNSNames %q"+errStr, h, cert.DNSNames) 
        
           	} 
        
           	return nil 
        
           }

Sometimes the peer may has multiple IP addresses, even including dynamic (i.e. floating) IP addresses. Probably only part of the IP addresses are valid against the SAN field in the certificate. But if an IP address which doesn't match the SAN in certificate is chosen by the golang standard lib, then it will fail to communicate with other etcd members.

highpon · 2023-12-05T23:45:56Z

@ahrtr
Would like to work on this!

ahrtr · 2023-12-06T07:54:56Z

Thanks @highpon .

ping @serathius @jmhbnz @wenjiaswe @fuweid for feedback.

ahrtr · 2023-12-18T11:09:57Z

cc @aojea for opinion as well

aojea · 2023-12-18T21:05:02Z

cc @aojea for opinion as well

I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?

ahrtr · 2023-12-18T21:13:46Z

cc @aojea for opinion as well

I can see that there are multiple flags that declare addresses identifying the client, should you not use one of those since those are known to be reachable?

Thanks for the feedback. Yes, I was planning to reuse the --initial-advertise-peer-urls if users enable this feature (disabled by default to keep the existing behaviour). I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?

$ etcd \
  --name m1 \
  --listen-client-urls http://host1:2379 \
  --advertise-client-urls http://host1:2379 \
  --listen-peer-urls http://host1:2380 \
  --initial-advertise-peer-urls http://host1:2380

Another problem is that it's hard to verify it in workflow CI. For example, how to create multiple IP addresses on the environment.

aojea · 2023-12-18T22:27:53Z

I am not sure whether it's a best practice to set LocalAddr, and whether there are any potential issue. Any comment?

In an etcd cluster there are some strong requirements or network connectivity between the peers, my experience is that every advanced networking knob like this can have also an undesired effect of providing a footgun for users that starts to implement complex scenarios just because they can or because they came with that idea and then they hit a problem and they start to ask for new knobs.
I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.

ahrtr · 2023-12-19T10:11:57Z

I always prefer to have the use cases and document the scenarios with how the new knobs are supposed to be used before adding them.

Yes, agreed on this. It's a real problem discovered in our testing when etcd and kube-vip coexist. Just as I mentioned in #17068 (comment), When an etcd member accepts a connection from a peer, it checks the peer's remote address against the Subject Alternative Name (SAN) field in the peer's certificate, and reject the connection if they don't match. It's exactly the reason why the LocalAddr matters here.

But when kube-vip coexist with etcd on the same host, kube-vip may dynamically add a floating IP to the host. Especially for the case of IPv6, it adds the IP to the beginning of the interface's IP list. Since etcd doesn't specify the LocalAddr when communicating with a peer, the floating IP is automatically chosen as the LocalAddr by the golang standard lib, eventually the connection is rejected by the peer.

I believe this is also a common issue. It applies the cases where the hosts on which the etcd runs have multiple IP addresses, especially dynamical floating IP.

Explicitly, it would be great if we can get feedback from network experts on:

In the beginning, we will disable the new feature by default, and etcd will not set a LocalAddr when communicating with a peer unless users intentionally enable it. But If there is no any concern or side effect, eventually we may want to enable it by default.
Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regression. I don't see an easy way to mimic the case of the host having multiple IP addresses in workflow CI environment. Probably we can try unit test?

aojea · 2023-12-19T14:41:49Z

that makes sense, then add just an option to define the localAddress for connecting to the apiserver, if is empty just don't set the localAddr field, this will make the change completely backwards compatible and you don't have to worry about the future.

Another problem is how to automatically verify it in workflow CI using an e2e test case to prevent any regressio

some ideas, if you are in the same node you can play with the localhost address 127.0.0.1 and the one in the node

create a TCP listener that receives the connection and assert on the address you want
test the negative case, verify that fails to connect if you set a wrong address

ahrtr · 2023-12-19T15:37:20Z

you can play with the localhost address 127.0.0.1 and the one in the node

Thanks for the suggestion.

ahrtr · 2023-12-19T16:46:14Z

@highpon Please follow comments below to update #17085. Thanks

Let's add an option --set-member-localaddr (bool), which defaults to false. If users enable it, then let's use the host in --initial-advertise-peer-urls as the LocalAddr when etcd communicates with a peer.
Follow comment above Supporting setting LocalAddr in peer communication #17068 (comment) to add an unit test. **play with the localhost address 127.0.0.1 and the one in the node**. We need to test both positive and negative cases.
- Please also add an e2e test. We just need to verify positive case in e2e case to ensure it doesn't break anything. Specifically, we just need to set --set-member-localaddr to true in both 1-member and 3-members clusters, and ensure nothing is broken.

highpon · 2023-12-19T20:29:00Z

@ahrtr
Sure! Thanks for your help!
I would like to update #17085 for your comment.

flawedmatrix · 2024-03-09T02:51:08Z

I've tried to pick up the ball to implement these changes here: https://github.com/flawedmatrix/etcd/tree/fix/17068

One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from --initial-advertise-peer-urls?

The major problem I ran into was that I've been unable to get the e2e test to fail Mutual TLS when playing around with using the host IP address or 127.0.0.1. It seems the etcd nodes would only start up properly if all the IP addresses provided in --listen-peer-urls, --initial-advertise-peer-urls, and --initial-cluster matched. I've been unable to get the Golang library to infer LocalAddr to be 127.0.0.1 if I used the host IP for the three flags I mentioned.

~~@ahrtr Any thoughts as to how one could properly test this feature in an e2e test?~~

After revisiting this issue, I've resolved the issue of how to e2e test this feature.

ahrtr · 2024-03-29T10:53:11Z

One question I had was whether we only wanted to set LocalAddr to an IP address, or should we try to resolve the DNS name to an address from --initial-advertise-peer-urls?

The answer should be the latter based on #17068 (comment)

ahrtr added the type/feature label Dec 5, 2023

jmhbnz assigned highpon Dec 5, 2023

jmhbnz added the area/tls label Dec 5, 2023

ahrtr added area/tls and removed area/tls labels Dec 6, 2023

highpon mentioned this issue Dec 9, 2023

Supporting setting LocalAddr in peer communication #17085

Closed

flawedmatrix mentioned this issue Mar 27, 2024

Support setting LocalAddr in peer communication - with e2e tests #17661

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting setting `LocalAddr` in peer communication #17068

Supporting setting `LocalAddr` in peer communication #17068

ahrtr commented Dec 5, 2023 •

edited

Loading

highpon commented Dec 5, 2023

ahrtr commented Dec 6, 2023

ahrtr commented Dec 18, 2023

aojea commented Dec 18, 2023

ahrtr commented Dec 18, 2023 •

edited

Loading

aojea commented Dec 18, 2023

ahrtr commented Dec 19, 2023

aojea commented Dec 19, 2023

ahrtr commented Dec 19, 2023

ahrtr commented Dec 19, 2023 •

edited

Loading

highpon commented Dec 19, 2023

flawedmatrix commented Mar 9, 2024 •

edited

Loading

ahrtr commented Mar 29, 2024

Supporting setting LocalAddr in peer communication #17068

Supporting setting LocalAddr in peer communication #17068

Comments

ahrtr commented Dec 5, 2023 • edited Loading

What would you like to be added?

Why is this needed?

highpon commented Dec 5, 2023

ahrtr commented Dec 6, 2023

ahrtr commented Dec 18, 2023

aojea commented Dec 18, 2023

ahrtr commented Dec 18, 2023 • edited Loading

aojea commented Dec 18, 2023

ahrtr commented Dec 19, 2023

aojea commented Dec 19, 2023

ahrtr commented Dec 19, 2023

ahrtr commented Dec 19, 2023 • edited Loading

highpon commented Dec 19, 2023

flawedmatrix commented Mar 9, 2024 • edited Loading

ahrtr commented Mar 29, 2024

Supporting setting `LocalAddr` in peer communication #17068

Supporting setting `LocalAddr` in peer communication #17068

ahrtr commented Dec 5, 2023 •

edited

Loading

ahrtr commented Dec 18, 2023 •

edited

Loading

ahrtr commented Dec 19, 2023 •

edited

Loading

flawedmatrix commented Mar 9, 2024 •

edited

Loading