SSL proxying accidentally stop working #2354

r7vme · 2018-04-16T13:46:10Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

NGINX Ingress controller version: 0.12.0

Kubernetes version (use kubectl version): 1.9.5

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): CoreOS 1632.3.0
Kernel (e.g. uname -a): 4.14.19-coreos
Install tools: aws-operator
Others:

What happened:
Accidentally on two clusters with different nginx-ingress-controller versions(0.12.0 and 0.9.0-beta-11) SSL proxying stopped to work. So all connections to port just stuck 443. Inside pod i see that TCP queue is 32Kb and there are 26K CLOSE_WAIT tcp connections to port 443.

Restart of the pod helps.

What you expected to happen:
SSL proxying works reliably and can handle intermittent issues (if any). At least whole ingress controller should fail if SSL proxy is broken.

How to reproduce it (as minimally and precisely as possible):
To be honest i have no 100% recipe how to reproduce it. In our case, on two customer clusters this happenned almost at the same time, so i suspect this can be related to some ingress resources or services (endpoints) reconfiguration. Clusters are in two AWS regions, so i'm pretty sure this is not some environement specific issue (e.g. network loss).

Anything else we need to know:
I see many connections like that (CLOSE_WAIT means that ingress-controller got FIN from client, but does not close the socket for some reason):

tcp6     247      0 172.18.190.193:443      172.18.190.128:8058     CLOSE_WAIT  -                   
tcp6     245      0 172.18.190.193:443      172.18.190.128:28589    CLOSE_WAIT  -                   
tcp6     246      0 172.18.190.193:443      172.18.188.128:30577    CLOSE_WAIT  -                   
tcp6     199      0 172.18.190.193:443      172.18.172.0:40706      CLOSE_WAIT  -                   
tcp6     199      0 172.18.190.193:443      172.18.133.0:6497       CLOSE_WAIT  -                   
tcp6     246      0 172.18.190.193:443      172.18.188.128:14640    CLOSE_WAIT  -                   
tcp6     199      0 172.18.190.193:443      172.18.148.128:46571    CLOSE_WAIT  -

32Kb in receive queue.

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:442             0.0.0.0:*               LISTEN      32/nginx: master pr 
tcp        0      0 0.0.0.0:18080           0.0.0.0:*               LISTEN      32/nginx: master pr 
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      32/nginx: master pr 
tcp6       0      0 :::442                  :::*                    LISTEN      32/nginx: master pr 
tcp6   32769      0 :::443                  :::*                    LISTEN      7/nginx-ingress-con 
tcp6       0      0 :::18080                :::*                    LISTEN      32/nginx: master pr 
tcp6       0      0 :::10254                :::*                    LISTEN      7/nginx-ingress-con 
tcp6       0      0 :::80                   :::*                    LISTEN      32/nginx: master pr

SSL proxy setup happens here and finally it calls Handle function. I suspect smth can happen in these functions.

The text was updated successfully, but these errors were encountered:

aledbf · 2018-04-16T14:09:59Z

SSL proxy setup happens here and finally it calls Handle function. I suspect smth can happen in these functions.

Do you have the flag --enable-ssl-passthrough present in the deployment?

r7vme · 2018-04-16T14:47:42Z

Yes,

 48       - args:
 49         - /nginx-ingress-controller
 50         - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
 51         - --configmap=$(POD_NAMESPACE)/ingress-nginx
 52         - --enable-ssl-passthrough
 53         - --annotations-prefix=nginx.ingress.kubernetes.io

r7vme · 2018-04-16T15:49:56Z

@aledbf also question, do we need --enable-ssl-passthrough if we already have AWS ELB proxy protocol enabled?

r7vme · 2018-04-19T14:27:21Z

Some clarifications, correct me if i'm wrong.

First, i was able to do SSL passthru without --enable-ssl-passthrough flag, because we already have proxy protocol configured in AWS ELB . Here is the ingress manifest. Behind i run nginx with SSL from this example.

One more time, correct me if i'm wrong.

--enable-ssl-passthrough only setups SSL proxy (Go-based) that extracts Client/Server IPs and ports from TCP packet and adds PROXY protocol header with this info.

As outcome: If you're running ingress controller behind AWS ELB or HAProxy, you should consider enabling PROXY protocol for load balancers instead of passing --enable-ssl-passthrough flag to nginx-ingress-controller.

Enabling --enable-ssl-passthrough in ingress-controller only makes sense if you're not running any load balancers and clients accessing ingress-controller directly (e.g. with NodePort).

afirth · 2023-05-12T09:17:13Z

just for future searchers, clarifying that the above conclusion is incorrect as the TCP proxy also does SNI - see linked #2540

This was referenced Apr 19, 2018

DOCS: Add clarification regarding ssl passthrough #2377

Merged

v_3_2_5: Disable SSL passthrough in nginx-ingress-controller giantswarm/k8scloudconfig#353

Merged

aledbf closed this as completed Apr 19, 2018

r7vme mentioned this issue May 18, 2018

DOCS: Correct ssl-passthrough annotation description. #2540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSL proxying accidentally stop working #2354

SSL proxying accidentally stop working #2354

r7vme commented Apr 16, 2018

aledbf commented Apr 16, 2018

r7vme commented Apr 16, 2018

r7vme commented Apr 16, 2018

r7vme commented Apr 19, 2018

afirth commented May 12, 2023

SSL proxying accidentally stop working #2354

SSL proxying accidentally stop working #2354

Comments

r7vme commented Apr 16, 2018

aledbf commented Apr 16, 2018

r7vme commented Apr 16, 2018

r7vme commented Apr 16, 2018

r7vme commented Apr 19, 2018

afirth commented May 12, 2023