Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Kubernetes manifest validation in pre-commit. #2380

Merged
merged 13 commits into from
Sep 17, 2024

Conversation

LeoLiao123
Copy link
Contributor

Why are these changes needed?

To resolve issue #2254, the absence of a Kubernetes manifest validation process caused the errors listed in the issue when deploying the Helm Chart. During validation, I also discovered that running helm template ./ in helm-chart/ray-cluster resulted in .template.spec.containers.ports being set to null (which should either be omitted or set to an empty list []). I have fixed this issue as well.

Related issue number

Closes #2254
#2174
#2239

Issue Reproduction and Manual Validation Results

I reproduced the error from Issue #2174. Running helm template ./ within the helm-chart/ray-cluster directory generates the following output:

ray-cluster $ helm template ./
---
# Source: ray-cluster/templates/raycluster-cluster.yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  labels:
    helm.sh/chart: ray-cluster-1.1.0
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/managed-by: Helm
  name: release-name-kuberay

spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
        dashboard-host: "0.0.0.0"
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          -
            volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-head
            image: rayproject/ray:2.9.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 2G
              requests:
                cpu: "1"
                memory: 2G
            securityContext:
              {}
------>     env:

        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}


        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels:
          helm.sh/chart: ray-cluster-1.1.0
          app.kubernetes.io/instance: release-name
          app.kubernetes.io/managed-by: Helm

  workerGroupSpecs:
  - rayStartParams:
      {}
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    numOfHosts: 1
    groupName: workergroup
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          -
            volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.9.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 1G
              requests:
                cpu: "1"
                memory: 1G
            securityContext:
              {}
------>     env:

        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}


        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels:
          helm.sh/chart: ray-cluster-1.1.0
          app.kubernetes.io/instance: release-name
          app.kubernetes.io/managed-by: Helm

After validation, the following output is generated:

stdin - RayCluster release-name-kuberay is invalid: problem validating schema. Check JSON formatting: jsonschema: '/spec/workerGroupSpecs/0/template/spec/containers/0/env' does not validate with file:///tmp/tmp.QsaaVEezYn/raycluster_v1.json#/properties/spec/properties/workerGroupSpecs/items/properties/template/properties/spec/properties/containers/items/properties/env/type: expected array, but got null
Summary: 1 resource found parsing stdin - Valid: 0, Invalid: 1, Errors: 0, Skipped: 0

If we consider the case mentioned earlier where .template.spec.containers.ports is set to null:

ray-cluster $ helm template ./
---
# Source: ray-cluster/templates/raycluster-cluster.yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  labels:
    helm.sh/chart: ray-cluster-1.1.0
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/managed-by: Helm
  name: release-name-kuberay

spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
        dashboard-host: "0.0.0.0"
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          -
            volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-head
            image: rayproject/ray:2.9.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 2G
              requests:
                cpu: "1"
                memory: 2G
            securityContext:
              {}

        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}


        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels:
          helm.sh/chart: ray-cluster-1.1.0
          app.kubernetes.io/instance: release-name
          app.kubernetes.io/managed-by: Helm

  workerGroupSpecs:
  - rayStartParams:
      {}
    replicas: 1
    minReplicas: 1
    maxReplicas: 3
    numOfHosts: 1
    groupName: workergroup
    template:
      spec:
        imagePullSecrets:
          []
        containers:
          -
            volumeMounts:
            - mountPath: /tmp/ray
              name: log-volume
            name: ray-worker
            image: rayproject/ray:2.9.0
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: "1"
                memory: 1G
              requests:
                cpu: "1"
                memory: 1G
            securityContext:
              {}
------>     ports:
              null

        volumes:
          - emptyDir: {}
            name: log-volume
        affinity:
          {}


        tolerations:
          []
        nodeSelector:
          {}
      metadata:
        annotations:
          {}
        labels:
          helm.sh/chart: ray-cluster-1.1.0
          app.kubernetes.io/instance: release-name
          app.kubernetes.io/managed-by: Helm

After validation, the following output is generated:

stdin - RayCluster release-name-kuberay is invalid: problem validating schema. Check JSON formatting: jsonschema: '/spec/workerGroupSpecs/0/template/spec/containers/0/ports' does not validate with file:///tmp/tmp.CnlCCHOuhb/raycluster_v1.json#/properties/spec/properties/workerGroupSpecs/items/properties/template/properties/spec/properties/containers/items/properties/ports/type: expected array, but got null
Summary: 1 resource found parsing stdin - Valid: 0, Invalid: 1, Errors: 0, Skipped: 0

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@LeoLiao123 LeoLiao123 changed the title Add strict parser [Feature]Add Kubernetes manifest validation in pre-commit. Sep 14, 2024
@LeoLiao123 LeoLiao123 changed the title [Feature]Add Kubernetes manifest validation in pre-commit. [Feature] Add Kubernetes manifest validation in pre-commit. Sep 14, 2024
Signed-off-by: Leo Liao  <93932709+LeoLiao123@users.noreply.github.com>
Signed-off-by: Leo Liao  <93932709+LeoLiao123@users.noreply.github.com>
Signed-off-by: Leo Liao  <93932709+LeoLiao123@users.noreply.github.com>
@LeoLiao123
Copy link
Contributor Author

cc: @MortalHappiness, @kevin85421

scripts/validate-helm.sh Outdated Show resolved Hide resolved
scripts/validate-helm.sh Outdated Show resolved Hide resolved
Comment on lines 15 to 16
curl -o "${KUBECONFORM_INSTALL}/crd2schema.py" https://raw.githubusercontent.com/yannh/kubeconform/master/scripts/openapi2jsonschema.py
SCRIPT_PATH="${KUBECONFORM_INSTALL}/crd2schema.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider store this script in our repo so that we don't need to download it everytime.

curl -L https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz -C "${KUBECONFORM_INSTALL}"
curl -o "${KUBECONFORM_INSTALL}/crd2schema.py" https://raw.githubusercontent.com/yannh/kubeconform/master/scripts/openapi2jsonschema.py
SCRIPT_PATH="${KUBECONFORM_INSTALL}/crd2schema.py"
RAYCLUSTER_CRD_PATH="$KUBERAY_HOME/ray-operator/config/crd/bases/ray.io_rayclusters.yaml"
Copy link
Member

@MortalHappiness MortalHappiness Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only RayCluster? How about also validate RayJob and RayService?

I think this can be done in either this PR or follow-up PRs.

scripts/validate-helm.sh Outdated Show resolved Hide resolved
.pre-commit-config.yaml Show resolved Hide resolved
he install info at  3. Store the  at  4. Call , 5. check the version of  in

Signed-off-by: leoliao <leoyeepaa@gmail.com>
scripts/openapi2jsonschema.py Outdated Show resolved Hide resolved
LeoLiao123 and others added 2 commits September 17, 2024 10:32
Signed-off-by: leoliao <leoyeepaa@gmail.com>
Signed-off-by: Leo Liao  <93932709+LeoLiao123@users.noreply.github.com>
Comment on lines 59 to 66

- repo: local
hooks:
- id: validate-helm-charts
name: validate helm charts with kubeconform
entry: bash -c 'version="0.6.7"; [ "$(kubeconform --v | grep -oP "(?<=v)[\d\.]+")" = "$version" ] || echo "kubeconform version is not $version"; bash scripts/validate-helm.sh'
language: system
pass_filenames: false
Copy link
Member

@MortalHappiness MortalHappiness Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. You must separate them to 2 hooks. Otherwise it won't catch the error.

See the following image, I set the version to "invalid" but it still passed.

image

scripts/validate-helm.sh Outdated Show resolved Hide resolved
scripts/validate-helm.sh Outdated Show resolved Hide resolved
Signed-off-by: leoliao <leoyeepaa@gmail.com>
Signed-off-by: leoliao <leoyeepaa@gmail.com>
Signed-off-by: leoliao <leoyeepaa@gmail.com>
hooks:
- id: check-kubeconform-version
name: kubeconform version check
entry: bash -c 'version="0.6.7"; [ "$(kubeconform -v | grep -oP "(?<=v)[\d\.]+")" = "$version" ] || { echo "kubeconform version is not $version"; }'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace this line with bash -c 'version="0.6.7"; [ "$(kubeconform -v | grep -oP "(?<=v)[\d\.]+")" = "$version" ] || { echo "kubeconform version is not $version"; exit 1; }'

Also I find that I made a mistake in line 34, can you replace it with bash -c 'version="1.60.3"; [ "$(golangci-lint --version | grep -oP "(?<=version )[\d\.]+")" = "$version" ] || { echo "golangci-lint version is not $version"; exit 1; }'? Thanks.

Otherwise LGTM.

…alidation if the version is not validated.

Signed-off-by: leoliao <leoyeepaa@gmail.com>
Signed-off-by: leoliao <leoyeepaa@gmail.com>
Copy link
Member

@MortalHappiness MortalHappiness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Follow-up PR: Also validate RayJob and RayService

cc @kevin85421

@kevin85421
Copy link
Member

kevin85421 commented Sep 17, 2024

Follow-up PR: Also validate RayJob and RayService

@LeoLiao123 can you open an issue to track the progress?

@MortalHappiness we only have RayCluster Helm chart.

@kevin85421 kevin85421 merged commit 39d42fb into ray-project:master Sep 17, 2024
27 checks passed
@MortalHappiness
Copy link
Member

@kevin85421 Oops I didn't notice that. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add a strict parser for YAMLs in CI
3 participants