Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add generated target for all node IPs #1119

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

fix: add generated target for all node IPs #1119

wants to merge 30 commits into from

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Dec 12, 2024

Description

Adds a new generator / target called KubeNodes that contains the internal IP addresses of nodes in the cluster.

NOTE: I have no idea (yet) wher the docs/reference/ file changes came from. They appear to be missing on main.

Related Issue

Relates to #970 . Steps to Validate include steps to verify 970 gets fixed.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Steps to Validate

Setup and verify behavior of the target

Create a k3d cluster named uds (we use names later for adding nodes):

k3d cluster create uds

Deploy slim-dev:

uds run slim-dev

Create and deploy monitoring layer:

uds run -f ./tasks/create.yaml single-layer-callable --set LAYER=monitoring

uds run -f ./tasks/deploy.yaml single-layer-callable --set LAYER=monitoring

Create and deploy metrics-server layer:

uds run -f ./tasks/create.yaml single-layer-callable --set LAYER=metrics-server

uds run -f ./tasks/deploy.yaml single-layer-callable --set LAYER=metrics-server

Inspect the network policy for scraping of kube nodes:

kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitoring

The spec: part is the relevant part, and should contain the IPs of the nodes:

Spec:
  PodSelector:     app.kubernetes.io/name=prometheus
  Not affecting ingress traffic
  Allowing egress traffic:
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 172.28.0.2/32
        Except:
  Policy Types: Egress

Add a node:

k3d node create extra1 --cluster uds --wait --memory 500M

Verify the internal IP of the new node:

kubectl get nodes -o custom-columns="NAME:.metadata.name,INTERNAL-IP:.status.addresses[?(@.type=='InternalIP')].address"

Re-get the netpol to verify the new ip is in the spec: block:

kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitorin

Should now be something like this:

Spec:
  PodSelector:     app.kubernetes.io/name=prometheus
  Not affecting ingress traffic
  Allowing egress traffic:
    To Port: <any> (traffic allowed to all ports)
    To:
      IPBlock:
        CIDR: 172.28.0.2/32
        Except:
    To:
      IPBlock:
        CIDR: 172.28.0.4/32
        Except:
  Policy Types: Egress

Verify Prometheus can read things

Connect directly to prometheus:

kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

Visit http://localhost:9090/

Execute this expression to see all node/cpu data:

node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate

To see just info from the extra1 node:

node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{node=~"^k3d-extra.*"}

Add a new node:

k3d node create extra2 --cluster uds --wait --memory 500M

Verify the netpol updates:

kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitorin

Re-execute the Prometheus query from above. It make take a few minutes for extra2 to show up though. Not sure why.

Delete a node and verify the spec updates again:

kubectl delete node k3d-extra1-0 && k3d node delete k3d-extra1-0

Re-reading the netpol should should the removal of that IP

Checklist before merging

Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
@catsby catsby requested a review from a team as a code owner December 12, 2024 22:13
@catsby catsby changed the title fix: Add generated target for all node IPs fix: add generated target for all node IPs Dec 12, 2024
Signed-off-by: catsby <clint@defenseunicorns.com>
Comment on lines 58 to 61
if (isReady) {
const ip = getNodeInternalIP(node);
if (ip) nodeSet.add(ip);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an edge case we might be missing here where a node has an Update that results in it being not-ready, but we have it in our set and need to remove it?

Maybe that's unnecessary (and could cause some churn if there's a broken node) but I think if we want to strictly track Ready nodes it might make sense to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I didn't consider that. It might cause churn but I added a delete() call to the set if the node is not ready. We can adjust it later if needed.

Thinking though, the target is KubeNodes, and not guaranteed to be ready nodes I suppose? Even if right now we're trying to only include the ready ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, especially in this case of metrics I suppose there could be an argument that a slightly unhealthy node is the exact type of node that you'd want to get metrics from (if they were reachable). Now I'm wondering if we should just ignore health altogether...worst case a pod has an extra egress permission to a node that is unhealthy, or partially unhealthy 🤔

Comment on lines 121 to 124
if (UDSConfig.kubeApiCidr) {
message +=
", ensure that the KUBEAPI_CIDR override configured for the operator is correct.";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message addition is irrelevant in the context of the KubeNodes policies, although it does bring up a potential question around this - we may want to add a manual "override" similar to the UDSConfig.kubeApiCidr override to allow people to opt out of auto-updates. The usage/explanation of that is documented here. It doesn't require too many code changes to support if we wanted to add it (should be able to copy most of what was done for KubeAPI).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If we do end up adding this manual config override let's make sure we also add to that same doc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for this was added in 15e6235 , I believe

Comment on lines +1 to +4
/**
* Copyright 2024 Defense Unicorns
* SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment on this added file - would be great to add jest testing around these functions. We encountered a number of issues with the kubeAPI policy and added testing to cover those functions as a result, would be great to do similar here. You should be able to model after what was done for KubeAPI with Jest mocks to avoid any k8s calls, and most of that should be relatively easy to generate as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests were added in 6921d23

catsby and others added 3 commits December 13, 2024 13:05
Co-authored-by: Micah Nagel <micah.nagel@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
@catsby catsby mentioned this pull request Dec 13, 2024
5 tasks
* update-docs-gen-script:
  update doc-gen output_dir
  chore: update arch diagrams (#1120)
  chore: bump aks sku from free to standard to address API server perfo… (#1121)
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
mjnagel added a commit that referenced this pull request Dec 13, 2024
## Description

The generated docs were moved, but it seems the script to generate them
was not updated to reflect the new location. This PR fixes that.

## Related Issue

Relates to #1119 because at the moment it has docs that shouldn't be
generated in it.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)
- [ ] 
## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed

Signed-off-by: catsby <clint@defenseunicorns.com>
Co-authored-by: Micah Nagel <micah.nagel@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
catsby and others added 5 commits December 13, 2024 17:14
Signed-off-by: catsby <clint@defenseunicorns.com>
Signed-off-by: catsby <clint@defenseunicorns.com>
* main:
  chore(deps): update support-deps (#1117)
  chore(deps): update grafana to 11.4.0 (#1053)
  chore: update doc-gen output_dir (#1123)
  feat: configurable authentication flows (#1102)
* origin/main:
  chore: allow separate configuration of admin domain name (#1114)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants