Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network policy blocks established connections to RDS #236

Closed
Mohilpalav opened this issue Mar 19, 2024 · 8 comments
Closed

Network policy blocks established connections to RDS #236

Mohilpalav opened this issue Mar 19, 2024 · 8 comments
Labels
bug Something isn't working strict mode Issues blocked on strict mode implementation

Comments

@Mohilpalav
Copy link

What happened:

We have a workload running in an EKS cluster which makes a request to an RDS cluster on startup. This request is blocked by the Network policy despite having an egress rule to the RDS cluster subnet from that workload. We suspect that the outbound connection goes through before the network policy node agent starts tracking the connections, and when the response is received the node agent doesn't have the known allowed connection to match due to which the traffic gets denied.

This is what we can see in the network policy flow logs:

Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45182;PROTOCOL: TCP;PolicyVerdict: DENY
Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45174;PROTOCOL: TCP;PolicyVerdict: DENY

10.47.53.151:5432-> RDS
10.27.36.181 -> EKS workload

Unfortunately, the node agent logs only show this at the moment #103:

2024-03-19 21:31:19.049604118 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.858783024 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.923276681 +0000 UTC Logger.check error: failed to get caller

What you expected to happen:
The connection to RDS should be allowed.

How to reproduce it (as minimally and precisely as possible):

  • create a network policy that allows all egress, but no ingress traffic for a simple application
  • application, on startup, makes outbound connections (several) to some external service (eg. example.com)
  • deploy the application as a multi-replica deployment making this behavior more consistent
  • review to see if any return traffic / responses are denied by network policy agent when they should not be

Anything else we need to know?:
Similar issues:
#73
#186

Environment:

  • Kubernetes version (use kubectl version): v1.28
  • CNI Version: v1.16.4
  • Network Policy Agent Version: v1.0.8
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.10.210-201.852.amzn2.x86_64
@Mohilpalav Mohilpalav added the bug Something isn't working label Mar 19, 2024
@jayanthvn jayanthvn added the strict mode Issues blocked on strict mode implementation label May 9, 2024
@jayanthvn
Copy link
Contributor

Here the pod attempted to start a connection before NP enforcement and hence response packet is dropped. Pl refer to this #189 (comment) for detailed explanation.

Our recommended solution for this is Strict mode, which will gate pod launch until policies are configured against the newly launched pod - https://github.com/aws/amazon-vpc-cni-k8s?tab=readme-ov-file#network_policy_enforcing_mode-v1171

Other option if you don't want to enable this mode is to allow Service CIDRs given that your pods communicate via Service vips and this will allow return traffic..

@achevuru
Copy link
Contributor

achevuru commented Jun 3, 2024

@Mohilpalav Did Strict mode help with your use case/issue?

@FabrizioCafolla
Copy link

@Mohilpalav Is there any solution for this issue?

@Monska85
Copy link

Hello there,

we have the same problem, connecting to RDS service from a pod, but also when contacting the S3 service.
We try to reproduce the error, but it is not something predictable. We have some errors when we try to deploy a lot of pods at the same time that try to connect to the RDS or S3 service, but it is not always the case.

Did you find any solution to this problem?

@Monska85
Copy link

Monska85 commented Aug 2, 2024

Hello there,

we found a workaround here.

Using the ANNOTATE_POD_IP environment variable speeds up the process of discovering pod IP and, at the moment, the pod startup issues are no longer present.

@albertschwarzkopf
Copy link

In my case ANNOTATE_POD_IP has not helped really. Randomly pods have issues to esatablish networkconnections (e.g. after restarts), although it has worked before the restart.

@Pavani-Panakanti
Copy link
Contributor

@Mohilpalav Were you able to try the strict mode and did that help with your issue ?

@Pavani-Panakanti
Copy link
Contributor

We are actively working on the fix for this issue. Fix can be tracked here #345

Closing this issue. Please follow above issue for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working strict mode Issues blocked on strict mode implementation
Projects
None yet
Development

No branches or pull requests

7 participants