feat: Adding CPU / RAM configurations to helm network deployments #8786

stevenplatt · 2024-09-25T15:16:07Z

Change 1: CPU/RAM Limits for node deployments

This PR assigns resource configurations to nodes that are part of helm network deployments.
Adding such resource configurations helps Kubernetes balance and deploy aztec nodes.

These initial values are chosen based on historical usage of the currently deployed devnet environment in AWS ( Grafana Dashboard ).

Definitions
requests: This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component.
limits: After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit.

Change 2: Options for bots and public networks

Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax:

helm upgrade --install <installation name> . -n <kubernetes namespace> \ 
--set network.public=true --set network.enableBots=true

By default, network.public is false since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments.

These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.

ludamad

LGTM, just the question if we want to set CPU limits vs just rely on scheduler. I can get this in without a CI pass if you ping me, too (no need to undraft)

just-mitch

Nice. Side note, when it is LoadBalancer, does EKS automatically set that up? How/Where do you find the public endpoints?

stevenplatt · 2024-09-26T16:18:49Z

Nice. Side note, when it is LoadBalancer, does EKS automatically set that up? How/Where do you find the public endpoints?

Yes, EKS automatically deploys a load balancer within AWS (outside of the cluster) when it is defined in the helm chart. It also automatically deletes it when helm uninstall is used.

* master: feat: make shplemini proof constant (#8826) feat: Adding CPU / RAM configurations to helm network deployments (#8786) chore: removing hack commitment from eccvm (#8825) feat: Handle epoch proofs on L1 (#8704)

) # Change 1: CPU/RAM Limits for node deployments This PR assigns resource configurations to nodes that are part of helm network deployments. Adding such resource configurations helps Kubernetes balance and deploy aztec nodes. These initial values are chosen based on historical usage of the currently deployed `devnet` environment in AWS ( [Grafana Dashboard](https://grafana.aztec.network/d/cdtxao66xa1ogc/aztec-dashboard?orgId=1&refresh=1m&var-network=devnet&var-instance=All&var-protocol_circuit=All&var-min_block_build=20m&var-system_res_interval=$__auto_interval_system_res_interval&var-sequencer=All&var-prover=All&from=now-7d&to=now) ). **Definitions** `requests:` This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component. `limits:` After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit. # Change 2: Options for bots and public networks Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax: ``` helm upgrade --install <installation name> . -n <kubernetes namespace> \ --set network.public=true --set network.enableBots=true ``` By default, `network.public` is `false` since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments. --- These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.

…8923) This PR includes two changes: - Adds persistent storage for Aztec nodes running the Spartan cluster - Repairs previously merged load balancer configurations # Persistent Storage Nodes that were previously configured with mounted volumes are now configured to use `volumeClaimTemplates`. Rather than directly configuring a `PersistentVolumeClaim`, a `volumeClaimTemplate` will automatically append index suffixes when replicas increase, so that there is not a storage conflict. ## Persistent Storage for Grafana The currently bundles Grafana instance uses a standard `PersistentVolumeClaim` since it is not expected to be deployed with replicas. Grafana also has an OS-level user defined it its container, which assumes ownership of the volume once it is mounted. To allow remounting, the user have to be defined in the helm chart. This is done using a `securityContext` in Grafana yaml template. # Repaired Load Balancer Config PR #8786 previously made network interfaces *either* internal or external. This meant that when the network was set as public, certain references to internal network interfaces were no longer reachable. Specifically items that address a node port ([bootNodeURL](https://github.com/AztecProtocol/aztec-packages/blob/master/spartan/aztec-network/templates/_helpers.tpl#L62) for example). This PR adds the load balancer as a second interface, without modifying the original. # Testing Code in this PR has been tested by by deployed the updated helm configurations to the Spartan cluster using command: `helm upgrade --install staging . -n staging --set network.public=true` As part of this change, replica counts have also validated to work without causing conflict for volume mounts, network interfaces or other resources.

updated deployment default values

8e2595c

stevenplatt requested review from ludamad, spypsy and just-mitch September 25, 2024 15:16

stevenplatt self-assigned this Sep 25, 2024

Merge branch 'master' into srp/helm-resource-limits

9e47b14

stevenplatt enabled auto-merge (squash) September 25, 2024 15:16

stevenplatt disabled auto-merge September 25, 2024 18:32

stevenplatt marked this pull request as draft September 25, 2024 18:33

ludamad approved these changes Sep 25, 2024

View reviewed changes

stevenplatt added 3 commits September 25, 2024 15:56

added on/off options for bot traffic and external load balancers

210d902

added ability to toggle pxe on/off

24404d3

fixed if statement syntax error in pxe template file

307fbb1

just-mitch approved these changes Sep 26, 2024

View reviewed changes

stevenplatt marked this pull request as ready for review September 26, 2024 16:15

stevenplatt enabled auto-merge (squash) September 26, 2024 16:15

Merge branch 'master' into srp/helm-resource-limits

224748e

stevenplatt merged commit 7790ede into master Sep 26, 2024
50 checks passed

stevenplatt deleted the srp/helm-resource-limits branch September 26, 2024 16:49

AztecBot mentioned this pull request Sep 26, 2024

chore(master): Release 0.57.0 #8788

Open

stevenplatt mentioned this pull request Oct 1, 2024

fix: Add persistent storage for aztec nodes in the spartan cluster #8923

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding CPU / RAM configurations to helm network deployments #8786

feat: Adding CPU / RAM configurations to helm network deployments #8786

stevenplatt commented Sep 25, 2024 •

edited

Loading

ludamad left a comment

just-mitch left a comment

stevenplatt commented Sep 26, 2024

feat: Adding CPU / RAM configurations to helm network deployments #8786

feat: Adding CPU / RAM configurations to helm network deployments #8786

Conversation

stevenplatt commented Sep 25, 2024 • edited Loading

Change 1: CPU/RAM Limits for node deployments

Change 2: Options for bots and public networks

ludamad left a comment

Choose a reason for hiding this comment

just-mitch left a comment

Choose a reason for hiding this comment

stevenplatt commented Sep 26, 2024

stevenplatt commented Sep 25, 2024 •

edited

Loading