-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding CPU / RAM configurations to helm network deployments #8786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just the question if we want to set CPU limits vs just rely on scheduler. I can get this in without a CI pass if you ping me, too (no need to undraft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Side note, when it is LoadBalancer, does EKS automatically set that up? How/Where do you find the public endpoints?
Yes, EKS automatically deploys a load balancer within AWS (outside of the cluster) when it is defined in the helm chart. It also automatically deletes it when |
) # Change 1: CPU/RAM Limits for node deployments This PR assigns resource configurations to nodes that are part of helm network deployments. Adding such resource configurations helps Kubernetes balance and deploy aztec nodes. These initial values are chosen based on historical usage of the currently deployed `devnet` environment in AWS ( [Grafana Dashboard](https://grafana.aztec.network/d/cdtxao66xa1ogc/aztec-dashboard?orgId=1&refresh=1m&var-network=devnet&var-instance=All&var-protocol_circuit=All&var-min_block_build=20m&var-system_res_interval=$__auto_interval_system_res_interval&var-sequencer=All&var-prover=All&from=now-7d&to=now) ). **Definitions** `requests:` This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component. `limits:` After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit. # Change 2: Options for bots and public networks Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax: ``` helm upgrade --install <installation name> . -n <kubernetes namespace> \ --set network.public=true --set network.enableBots=true ``` By default, `network.public` is `false` since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments. --- These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.
) # Change 1: CPU/RAM Limits for node deployments This PR assigns resource configurations to nodes that are part of helm network deployments. Adding such resource configurations helps Kubernetes balance and deploy aztec nodes. These initial values are chosen based on historical usage of the currently deployed `devnet` environment in AWS ( [Grafana Dashboard](https://grafana.aztec.network/d/cdtxao66xa1ogc/aztec-dashboard?orgId=1&refresh=1m&var-network=devnet&var-instance=All&var-protocol_circuit=All&var-min_block_build=20m&var-system_res_interval=$__auto_interval_system_res_interval&var-sequencer=All&var-prover=All&from=now-7d&to=now) ). **Definitions** `requests:` This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component. `limits:` After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit. # Change 2: Options for bots and public networks Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax: ``` helm upgrade --install <installation name> . -n <kubernetes namespace> \ --set network.public=true --set network.enableBots=true ``` By default, `network.public` is `false` since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments. --- These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.
) # Change 1: CPU/RAM Limits for node deployments This PR assigns resource configurations to nodes that are part of helm network deployments. Adding such resource configurations helps Kubernetes balance and deploy aztec nodes. These initial values are chosen based on historical usage of the currently deployed `devnet` environment in AWS ( [Grafana Dashboard](https://grafana.aztec.network/d/cdtxao66xa1ogc/aztec-dashboard?orgId=1&refresh=1m&var-network=devnet&var-instance=All&var-protocol_circuit=All&var-min_block_build=20m&var-system_res_interval=$__auto_interval_system_res_interval&var-sequencer=All&var-prover=All&from=now-7d&to=now) ). **Definitions** `requests:` This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component. `limits:` After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit. # Change 2: Options for bots and public networks Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax: ``` helm upgrade --install <installation name> . -n <kubernetes namespace> \ --set network.public=true --set network.enableBots=true ``` By default, `network.public` is `false` since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments. --- These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.
…8923) This PR includes two changes: - Adds persistent storage for Aztec nodes running the Spartan cluster - Repairs previously merged load balancer configurations # Persistent Storage Nodes that were previously configured with mounted volumes are now configured to use `volumeClaimTemplates`. Rather than directly configuring a `PersistentVolumeClaim`, a `volumeClaimTemplate` will automatically append index suffixes when replicas increase, so that there is not a storage conflict. ## Persistent Storage for Grafana The currently bundles Grafana instance uses a standard `PersistentVolumeClaim` since it is not expected to be deployed with replicas. Grafana also has an OS-level user defined it its container, which assumes ownership of the volume once it is mounted. To allow remounting, the user have to be defined in the helm chart. This is done using a `securityContext` in Grafana yaml template. # Repaired Load Balancer Config PR #8786 previously made network interfaces *either* internal or external. This meant that when the network was set as public, certain references to internal network interfaces were no longer reachable. Specifically items that address a node port ([bootNodeURL](https://github.com/AztecProtocol/aztec-packages/blob/master/spartan/aztec-network/templates/_helpers.tpl#L62) for example). This PR adds the load balancer as a second interface, without modifying the original. # Testing Code in this PR has been tested by by deployed the updated helm configurations to the Spartan cluster using command: `helm upgrade --install staging . -n staging --set network.public=true` As part of this change, replica counts have also validated to work without causing conflict for volume mounts, network interfaces or other resources.
Change 1: CPU/RAM Limits for node deployments
This PR assigns resource configurations to nodes that are part of helm network deployments.
Adding such resource configurations helps Kubernetes balance and deploy aztec nodes.
These initial values are chosen based on historical usage of the currently deployed
devnet
environment in AWS ( Grafana Dashboard ).Definitions
requests:
This is the minimum resource that must be available on the underlying server before Kubernetes can deploy the component.limits:
After deployment, the component is allowed to flex up and down, but never above this set limit. Using a limit keeps the shared infra stable when there is memory leaks or unexpected application behavior. Components are terminated and redeployed if exceeding the assigned limit.Change 2: Options for bots and public networks
Additionally, this PR add configuration to turn bots as well as public access on or off at the time of the helm deployment. This can be used with the following helm syntax:
By default,
network.public
isfalse
since enabling this deploys load balancers which are not available when running a Kubernetes cluster on a local machine and within CI environments.These resource configurations have been tested by deploying the parent helm chart to the spartan Kubernetes cluster in AWS.