-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Proxy Protocol #1065
Comments
This would likely have to live as a module that plugs into go-libp2p-swarm and translates addresses. A general purpose "connection transformer" (takes a connection, returns a wrapped connection) may do the trick. However, this seems pretty terrible. I'm surprised that, e.g., AWS doesn't just use a NAT. Do you know the motivation for this approach? |
it comes out of http://www.haproxy.org/ where you have an edge that terminates TLS, and passes the unwrapped connection back to the backend application. |
I see... this seems like a bad fit for libp2p:
1. Our crypto transports have libp2p-specific requirements/features so we can't really offload this work without a custom proxy.
2. We have UDP-based transports. Does this work?
3. Load balancing doesn't really work because every libp2p node has a different peer ID.
Is there no way to just use a NAT? And/or is there anything HAProxy provides us that's actually useful?
Note: Going down the "connection transform function" route is pretty simple and non-invasive so I'm not really against providing this feature, but I want to make sure it's really worth solving first.
|
An example use case is when a peer needs to be discoverable via DNS. Having DNS point at a load balancer or reverse proxy allows for instantaneous adaptive routing in the event of node maintenance, disaster recovery or machine upgrade. Having DNS point directly at a node is several orders of magnitude slower to update in these scenarios. Some Libp2p node roles this applies to include bootstrap nodes, hosted APIs, etc |
Also note in some cases hot standby nodes behind a load balancer with a Fixed Weighting LB algorithm would be desirable. |
Ok, so it looks like I misunderstood how this protocol worked. I assumed every proxied connection would get a new source port and you'd ask the proxy "what's the real IP/port behind this source port". Apparently, the proxy will just prepend a header to the packet. This is doable in libp2p, just more invasive. We'd need to pass a "proxy" (that supports both UDP and TCP) to all transports on construction, and these transports would need to invoke this proxy on new inbound connections. We can do it but there will need to be significant motivation. For now, I'd consider something like https://blog.cloudflare.com/mmproxy-creative-way-of-preserving-client-ips-in-spectrum/, even better, use a packet-based routing system. In terms of motivation, it sounds like HA proxies were invented to:
From this I conclude:
I'm frustrated because it sounds like someone came up with a solution targeting HTTP proxying/balancing, then this solution became the "standard" (ish because there are several "standards") for general-purpose load balancing even though it operates on the wrong OSI layer. I'd like to avoid infecting libp2p with this if at all possible. |
Proxy Protocol is not tied to HAProxy in any way, nor is it exclusively HTTP. For better or worst it has become a standard,gaining adoption in all reverse proxies... even databases and other applications now support it. It can be used with TCP only 100%. The particular use case in question is with AWS Elastic Load Balancers and Kubernetes, but would effect any libp2p node on a conventional EC2 machine behind an ELB, as well. To pass upstream client IPs, enabling Proxy Protocol (v1 in ELB Classic, v2 in NLB) is required. @Stebalien I'm going to push back on your last comment, I don't think it's accurate. Rejecting this feature pushes the complexities of using very standard ops tools back on to operators. As mentioned in the original issue, configuring a transparent proxy is a possible work-around, but not a great one. Running libp2p nodes in a Cloud native platform such as Kubernetes is something that would greatly benefit this project IMHO |
If we decide "it's what people expect so we'll support it", I'll live with that. There's no point in fighting that particular fight. But I'd still like to know why people seem to like this protocol so much. It seems to fall directly in the "worse is better" category. That way we can at least warn users. |
i think the main reason that it's being done at application layer is that there are plenty of cloud cases where the people dealing with the application / fronting don't have or aren't managing the routing layer (or don't have root privileges on the edge / loadbalancers) |
@willscott hits it on the head, and I why I say this is a case for Cloud Native support. Using AWS EKS Kubernetes as an example, a Kubernetes user (access to Kubernetes control plane only) can enable Proxy Protocol in a |
it's my understanding Proxy Protocol v2 has UDP support |
also, AWS Network Load Balancers support UDP and Proxy Protocol v2 |
Strongly agree with @stongo. In our case, we'd love to build an easy-to-scale Substrate node cluster in a GKE stateful set. The only problem we are facing is the p2p port. In the past we allocate a VM for each node. Everything goes pretty well. However after we have decided to switch to Kubernetes stack, we suddenly found problematic to expose the p2p port of the nodes in a cluster, where there's no good way to allocate public IP addresses to the nodes in the cluster. The de-facto solution is to set up a LB for the service. However this violates the model of libp2p. A very typical use case is like this: A peer wants to connect to a target peer. So it looks up the endpoint with its peer id from the DHT, and then establish a tcp/udp connection to the endpoint. If we put a LB in front of a cluster, then the LB has no way but to randomly select a node in the cluster to connect, and it has only 1/n chance to connect to the correct peer. |
Did anyone find a solution to this issue? |
@jacobhjkim https://github.com/mcamou/go-libp2p-kitsune is the closest thing I can think of. |
Feature
Include Proxy Protocol to support the use of Load Balancers and Reverse Proxies.
Background
The current behaviour of a libp2p node (e.g. lotus) behind a load balancer is to reflect a server's private IP or loopback interface as the source IP address of upstream peer connections established through a Load Balancer or Reverse Proxy.
This is a well known TCP load balancing issue, with a conventional but complex workaround to use Transparent Proxies. This requires kernel and iptables configuration, which creates a high barrier to success running libp2p nodes in this use case.
For Proxy Protocol to be fully supported, the downstream endpoint (libp2p node) should support Proxy Protocol 1 and 2 to establish the client's IP address (upstream libp2p peer).
Current Behaviour
Behind an AWS Elastic Load Balancer:
Behind a Reverse Proxy e.g. NGINX, HAPROXY:
Desired Behaviour
Current Pitfalls without Proxy Protocol support
The text was updated successfully, but these errors were encountered: