Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci.jenkins.io] Define virtual networking for AWS #4320

Closed
Tracked by #4313
dduportal opened this issue Sep 28, 2024 · 6 comments
Closed
Tracked by #4313

[ci.jenkins.io] Define virtual networking for AWS #4320

dduportal opened this issue Sep 28, 2024 · 6 comments

Comments

@dduportal
Copy link
Contributor

dduportal commented Sep 28, 2024

We need to define virtual networking for ci.jenkins.io in AWS.

  • We need public dual stack network to allow inbound (SSH, HTTP, HTTPS) on the controller with dualstack (both IPv4 and IPv6) network
  • Need 3 IPv4 only subnets. Check https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html to be sure it maps to something doable in EC2
  • Outbound requests only need to be IPv4, but we will need at least 2 public egress IPv4 (and the ability to extend it) for the whole VPC (See Internet Gateway/Elastic IPs). - we choose to use 1 gw per subnet for now
  • Network restrictions for the VPC:
    • Inbound SSH will be restricted through the VPN IPs
    • HTTP/HTTPS/JNLP will be ok for inbound from everywhere
    • Outbound HTTP/HTTPS/HKP to everywhere
    • Outbound SSH only to GitHub IPs
  • Private (internal communications) only allowed communications:
    • Controller should be able to reach the VM agents privately through SSH, should be reachable by both VMs and container agents with HTTP/HTTPS but also JNLP + access the internet (see egress VPC rules above)
      • Inbound from internet to the controller: HTTP/HTTPS from everywhere and SSH only from the VPN.
    • VM agents in their own subnets, allowed to reach ci.jenkins.io (inbound JNLP/HTTP/HTTPS) and to be reached through the ci.jenkins.io VM in SSH (outbound- agents) + access the internet (see egress VPC rules above)
    • EKS API which should be private (ref. https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html) with private Route53 DNS zone
    • EKS containers should be able to access ci.jenkins.io controller in inbound (HTTP/HTTPS/JNLP) and the internet (see egress VPC rules above)
    • The ECR registry used as a pull-through cache should be reachable only privately by any VM or container (including ci.jenkins.io itself)
    • The ACP instance to be hosted in EKS need to be reachable only privately by the VM agents and container agents only (not needed from ci.jio).
@dduportal dduportal changed the title Define ci.jenkins.io virtual network [ci.jenkins.io] Define virtual networking for AWS Sep 28, 2024
@dduportal dduportal added triage Incoming issues that need review ci.jenkins.io aws labels Sep 28, 2024
@dduportal dduportal added this to the infra-team-sync-2024-10-01 milestone Sep 28, 2024
@smerle33 smerle33 self-assigned this Sep 30, 2024
@smerle33 smerle33 removed the triage Incoming issues that need review label Sep 30, 2024
@smerle33
Copy link
Contributor

smerle33 commented Oct 14, 2024

to provide multiple IPs for the gateway:

https://aws.amazon.com/blogs/networking-and-content-delivery/attach-multiple-ips-to-a-nat-gateway-to-scale-your-egress-traffic-pattern/

image

we can use https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/nat_gateway#secondary_allocation_ids but it's not available yet with the module terraform-aws-modules/terraform-aws-vpc#1109

we decided (with @dduportal) to use one gateway per subnet with one ip per gateway for now as it is not yet available with the module and will imply to change it or define by hand all the resources.

@dduportal
Copy link
Contributor Author

Update regarding network restrictions:

  • https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-best-practices.html tells us we should use Network ACLs (similar to "Network Security Groups" in Azure to restrict traffic going from and to subnets, while using Security groups for the intra-subnet traffic. It means:
    • ci.jenkins.io controller subnet will mostly be protected by Network ACL as we should only have network interface for the controller itself. The associated subnet should only have SG if we add private endpoints or other VMs.
    • ephemeral agents subnet needs both:
      • Network ACLs to forbid inbound access except with SSH from ci.jio or from the VPN outbound (so we can debug VMs accesses) and outbound access only to the internet (HTTP/HTTPS/HKP, etc. ) or to ci.jio (in HTTPS/JNLP).
      • a SG to forbid network access between, agents themselves (to limit transversal attack risks)

@smerle33
Copy link
Contributor

smerle33 commented Nov 8, 2024

Update regarding No need for multiple availability zones (when possible)

it is mandatory to have at least 2 Availability Zones as per : https://docs.aws.amazon.com/eks/latest/userguide/network-reqs.html#network-requirements-subnets

but we should still be able to lock our nodes to only one AZ to be able to use our EBS volumes (not spanning on multiple AZs) as per : terraform-aws-modules/terraform-aws-eks#1252

@dduportal
Copy link
Contributor Author

Update: we need to adjust the initial hypothesis (as per our findings):

  • ci.jenkins.io controller must be in a public subnet to ensure correct in/out routing without opening too much routes
  • The subnet for VM agents must allow up to 400 IPs so we need to change it from /24 to /23
  • EKS requires 2 subnets in 2 distinct AZs, which requires mapping each private subnet to an AZ (even if always the same) => syntactic sugar to ensure EKS creation will be easier
  • VPC module usage automation is needed to avoid human mistakes when changing subnet topologies. Terraform provides CIDR calculation for us!

=> This should be implemented in a single big PR: jenkins-infra/terraform-aws-sponsorship#36

@dduportal
Copy link
Contributor Author

Closing this issue as there are no more foundational work to be made. If we have further "network" issues, it will be higlhy specific to one of the other topics (Controller VM, ephemeral agents or EKS).

@dduportal
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants