Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gRPC] clientconn: set authority with the latest dial target #10489

Closed
wants to merge 1 commit into from

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Feb 21, 2019

When user dials with "grpc.WithDialer", "grpc.DialContext" "cc.parsedTarget"
update only happpens once. This is problematic, because when TLS is enabled,
retries happen through "grpc.WithDialer" with static "cc.parsedTarget" from
the initial dial call.

If the server authenticates by IP addresses, we want to set a new endpoint as
a new authority. Otherwise

transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.168.154.254, not 192.168.208.149

Fix #9949.
Fix kubernetes/kubernetes#72102.

When user dials with "grpc.WithDialer", "grpc.DialContext" "cc.parsedTarget"
update only happpens once. This is problematic, because when TLS is enabled,
retries happen through "grpc.WithDialer" with static "cc.parsedTarget" from
the initial dial call.
If the server authenticates by IP addresses, we want to set a new endpoint as
a new authority. Otherwise
"transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.168.154.254, not 192.168.208.149"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
@gyuho
Copy link
Contributor Author

gyuho commented Feb 21, 2019

/cc @jpbetz @xiang90 @hexfusion

I am also creating an issue upstream.

With this workaround, balancer failover works and #9949 goes away.

@gyuho gyuho added the WIP label Feb 21, 2019
@hexfusion
Copy link
Contributor

@gyuho great sleuthing, thank you!

@jingyih
Copy link
Contributor

jingyih commented Feb 23, 2019

Thanks for debugging and fixing this.

@@ -950,6 +950,17 @@ func (ac *addrConn) createTransport(connectRetryNum, ridx int, backoffDeadline,
Metadata: addr.Metadata,
Authority: ac.cc.authority,
}
if target.Addr != target.Authority {
target.Authority = target.Addr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to follow the same heuristic that was used to determine the Authority when initially creating the clientconn rather than just assuming the IP address will do?

i.e. this block:

creds := cc.dopts.copts.TransportCredentials
if creds != nil && creds.Info().ServerName != "" {
cc.authority = creds.Info().ServerName
} else if cc.dopts.insecure && cc.dopts.authority != "" {
cc.authority = cc.dopts.authority
} else {
// Use endpoint from "scheme://authority/endpoint" as the default
// authority for ClientConn.
cc.authority = cc.parsedTarget.Endpoint
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can use cc.parsedTarget.Endpoint since it was set static initially in DialContext. Pretty sure gRPC team can help us find a better way.

I've created an upstream PR grpc/grpc-go#2650, and am still waiting for their feedback.

@jingyih
Copy link
Contributor

jingyih commented Feb 25, 2019

/cc @wenjiaswe

@xiang90
Copy link
Contributor

xiang90 commented Feb 28, 2019

@jingyih @wenjiaswe @jpbetz can you poke the grpc team a little bit about this one?

@hexfusion
Copy link
Contributor

@gyuho this is on my radar would it help if I took over and worked on fix in grpc-go? Not sure where you are with things now?

@vaishali2693
Copy link

@gyuho It seems that balancer supports multiple subconnections with different creds. Why can't we dial against each subconnection whenever a subconnection is chosen by balancer? Also at what point does the balancer gets all the IPs of etcd cluster if we are dialing using one IP?

@gyuho
Copy link
Contributor Author

gyuho commented Apr 17, 2019

@vaishali2693 Our recent refactoring on client balancer was just migrating over to the new gRPC balancer interface. In that process, we made as minimum changes as possible to keep the old behavior. We will revisit this issue again in May, and look into your suggestions. /cc @jpbetz

@vaishali2693
Copy link

Thanks @gyuho . Also if you could help me understand the following so that we can find a workaround for ourselves -
When we connect to one endpoint which is an IP address -
conn, err := client.dial(dialEndpoint, grpc.WithBalancerName(roundRobinBalancerName))
how are the other addresses made available to the balancer? I am not sure at what point “HandleResolvedAddrs” is called and how does it get the IPs of the other nodes in etcd cluster.
Also if some new node is added to cluster or IP changes, how is that refreshed?
Can you please help me understand this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants