Skip to content

Commit

Permalink
Cherry pick commits for releasing v2.0.0 (#2156)
Browse files Browse the repository at this point in the history
* Support gang scheduling with Yunikorn (#2107)

* Add Yunikorn scheduler and example

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add test cases

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add code comments

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add license comment

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Inline mergeNodeSelector

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Fix initial number implementation

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 8fcda12)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Update Makefile for building sparkctl (#2119)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 4bc6e89)
Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: Add default values for namespaces to match usage descriptions  (#2128)

* fix: Add default values for namespaces to match usage descriptions

Signed-off-by: pengfei4.li <pengfei4.li@ly.com>

* fix: remove incorrect cache settings

Signed-off-by: pengfei4.li <pengfei4.li@ly.com>

---------

Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
Co-authored-by: pengfei4.li <pengfei4.li@ly.com>
(cherry picked from commit 52f818d)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Fix: Spark role binding did not render properly when setting spark service account name (#2135)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit a1a38ea)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Reintroduce option webhook.enable (#2142)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 9e88049)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Add default batch scheduler argument (#2143)

* Add default batch scheduler argument

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

* Add helm unit test

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 9cc1c02)
Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: unable to set controller/webhook replicas to zero (#2147)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 1afa72e)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Adding support for setting spark job namespaces to all namespaces (#2123)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit c93b0ec)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Support extended kube-scheduler as batch scheduler (#2136)

* Support coscheduling with kube-scheduler plugins

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add example for using kube-schulder coscheduling

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit e8d3de9)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Run e2e tests on Kind (#2148)

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit c810ece)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Set schedulerName to Yunikorn (#2153)

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 62b4ca6)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Create role and rolebinding for controller/webhook in every spark job namespace if not watching all namespaces (#2129)

watching all namespaces

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 592b649)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Fix: e2e test failes due to webhook not ready (#2149)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit dee91ba)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Upgrade to Go 1.23.1 (#2155)

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit 10fcb8e)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Upgrade to Spark 3.5.2 (#2154)

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit e1b7a27)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 (#2159)

Bumps [sigs.k8s.io/scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) from 0.29.7 to 0.29.8.
- [Release notes](https://github.com/kubernetes-sigs/scheduler-plugins/releases)
- [Changelog](https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/RELEASE.md)
- [Commits](kubernetes-sigs/scheduler-plugins@v0.29.7...v0.29.8)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/scheduler-plugins
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 95d202e)
Signed-off-by: Yi Chen <github@chenyicn.net>

* feat: support driver and executor pod use different priority (#2146)

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* feat: support driver and executor pod use different priority

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* feat: if *app.Spec.Driver.PriorityClassName and *app.Spec.Executor.PriorityClassName specifically defined, then can precedence over spec.batchSchedulerOptions.priorityClassName

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* feat: merge the logic of setPodPriorityClassName into addPriorityClassName

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* feat: add adjust pointer if is nil

Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* feat: remove spec.batchSchedulerOptions.priorityClassName define , split driver and executor pod priorityClass

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* feat: Optimize code to avoid null pointer exceptions

Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* fix: remove backup crd files

Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>

* fix: remove BatchSchedulerOptions.PriorityClassName test code

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

* fix: add driver and executor pod priorityClassName test code

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>

---------

Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
Co-authored-by: Kevin Wu <kevin.wu@momenta.ai>
(cherry picked from commit 6ae1b2f)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Bump gocloud.dev from 0.37.0 to 0.39.0 (#2160)

Bumps [gocloud.dev](https://github.com/google/go-cloud) from 0.37.0 to 0.39.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](google/go-cloud@v0.37.0...v0.39.0)

---
updated-dependencies:
- dependency-name: gocloud.dev
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit e58023b)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Update e2e tests (#2161)

* Add sleep buffer to ensture the webhooks are ready before running the e2e tests

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove duplicate operator image build tasks

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update e2e tests

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update examples

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit e6a7805)
Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: webhook not working when settings spark job namespaces to empty (#2163)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 7785107)
Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: The logger had an odd number of arguments, making it panic (#2166)

Signed-off-by: tcassaert <tcassaert@inuits.eu>
(cherry picked from commit eb48b34)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Upgrade to Spark 3.5.2(#2012) (#2157)

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <a01045542949@gmail.com>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <a01045542949@gmail.com>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <a01045542949@gmail.com>

* Upgrade to Spark 3.5.2

Signed-off-by: HyukSangCho <a01045542949@gmail.com>

---------

Signed-off-by: HyukSangCho <a01045542949@gmail.com>
(cherry picked from commit 9f0c08a)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Feature: Add pprof endpoint (#2164)

* add pprof support to the operator Controller Manager

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>

* add pprof support to helm chart

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>

---------

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
(cherry picked from commit 75b9266)
Signed-off-by: Yi Chen <github@chenyicn.net>

* fix the make kind-delete-custer to avoid accidental kubeconfig deletion (#2172)

Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
(cherry picked from commit cbfefd5)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 (#2174)

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.27 to 1.27.33.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@config/v1.27.27...config/v1.27.33)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit b818332)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 (#2173)

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.3 to 3.16.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.15.3...v3.16.1)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit f3f80d4)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Add specific error in log line when failed to create web UI service (#2170)

* Add specific error in log line when failed to create web UI service

Signed-off-by: tcassaert <tcassaert@inuits.eu>

* Update log to reflect correct resource that could not be created

Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: tcassaert <tcassaert@protonmail.com>

---------

Signed-off-by: tcassaert <tcassaert@inuits.eu>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit ed3226e)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Account for spark.executor.pyspark.memory in Yunikorn gang scheduling (#2178)

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
(cherry picked from commit a2f71c6)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Fix: spark application does not respect time to live seconds (#2165)

* Add time to live seconds example spark application

Signed-off-by: Yi Chen <github@chenyicn.net>

* fix: spark application does not respect time to live seconds

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit c855ee4)
Signed-off-by: Yi Chen <github@chenyicn.net>

* Update release workflow and docs (#2121)

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit bca6aa8)
Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Jacob Salway <jacob.salway@gmail.com>
Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: pengfei4.li <pengfei4.li@ly.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kevin Wu <kevin.wu@momenta.ai>
Signed-off-by: Kevin.Wu <kevin.wu@momenta.ai>
Signed-off-by: tcassaert <tcassaert@inuits.eu>
Signed-off-by: HyukSangCho <a01045542949@gmail.com>
Signed-off-by: ImpSy <3097030+ImpSy@users.noreply.github.com>
Signed-off-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: Jacob Salway <jacob.salway@gmail.com>
Co-authored-by: Neo <56439757+snappyyouth@users.noreply.github.com>
Co-authored-by: pengfei4.li <pengfei4.li@ly.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevinz <ruoshuidba@gmail.com>
Co-authored-by: Kevin Wu <kevin.wu@momenta.ai>
Co-authored-by: tcassaert <tcassaert@protonmail.com>
Co-authored-by: ha2hi <56156892+ha2hi@users.noreply.github.com>
Co-authored-by: Sébastien Maintrot <3097030+ImpSy@users.noreply.github.com>
  • Loading branch information
10 people authored Sep 23, 2024
1 parent 74b345a commit fab1c46
Show file tree
Hide file tree
Showing 86 changed files with 3,336 additions and 993 deletions.
64 changes: 64 additions & 0 deletions .github/workflows/check-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Check Release

on:
pull_request:
branches:
- release-*
paths:
- VERSION

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
SEMVER_PATTERN: '^v([0-9]+)\.([0-9]+)\.([0-9]+)(-rc\.([0-9]+))?$'

jobs:
check:
runs-on: ubuntu-latest

steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Check whether version matches semver pattern
run: |
VERSION=$(cat VERSION)
if [[ ${VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${VERSION}' matches semver pattern."
else
echo "Version '${VERSION}' does not match semver pattern."
exit 1
fi
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Check whether chart version and appVersion matches version
run: |
VERSION=${VERSION#v}
CHART_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep version | awk '{print $2}')
CHART_APP_VERSION=$(cat charts/spark-operator-chart/Chart.yaml | grep appVersion | awk '{print $2}')
if [[ ${CHART_VERSION} == ${VERSION} ]]; then
echo "Chart version '${CHART_VERSION}' matches version '${VERSION}'."
else
echo "Chart version '${CHART_VERSION}' does not match version '${VERSION}'."
exit 1
fi
if [[ ${CHART_APP_VERSION} == ${VERSION} ]]; then
echo "Chart appVersion '${CHART_APP_VERSION}' matches version '${VERSION}'."
else
echo "Chart appVersion '${CHART_APP_VERSION}' does not match version '${VERSION}'."
exit 1
fi
- name: Check if tag exists
run: |
git fetch --tags
if git tag -l | grep -q "^${VERSION}$"; then
echo "Tag '${VERSION}' already exists."
exit 1
else
echo "Tag '${VERSION}' does not exist."
fi
55 changes: 14 additions & 41 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,26 +72,11 @@ jobs:
- name: Run unit tests
run: make unit-test

- name: Build Spark-Operator Docker Image
run: make docker-build IMAGE_TAG=latest

- name: Check changes in resources used in docker file
run: |
DOCKERFILE_RESOURCES=$(cat Dockerfile | grep -P -o "COPY [a-zA-Z0-9].*? " | cut -c6-)
for resource in $DOCKERFILE_RESOURCES; do
# If the resource is different
if ! git diff --quiet origin/master -- $resource; then
## And the appVersion hasn't been updated
if ! git diff origin/master -- charts/spark-operator-chart/Chart.yaml | grep +appVersion; then
echo "resource used in docker.io/kubeflow/spark-operator has changed in $resource, need to update the appVersion in charts/spark-operator-chart/Chart.yaml"
git diff origin/master -- $resource;
echo "failing the build... " && false
fi
fi
done
- name: Build Spark operator
run: make build-operator

build-helm-chart:
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
steps:
- name: Determine branch name
id: get_branch
Expand Down Expand Up @@ -131,7 +116,7 @@ jobs:
- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
env:
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
BRANCH: ${{ steps.get_branch.outputs.BRANCH }}
run: ct lint --check-version-increment=false --target-branch $BRANCH

- name: Detect CRDs drift between chart and manifest
Expand Down Expand Up @@ -163,37 +148,25 @@ jobs:
minikube image load docker.io/kubeflow/spark-operator:local
ct install
integration-test:
runs-on: ubuntu-22.04
e2e-test:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v4
with:
fetch-depth: "0"
fetch-depth: 0

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: "go.mod"
go-version-file: go.mod

- name: setup minikube
uses: manusa/actions-setup-minikube@v2.11.0
with:
minikube version: v1.33.0
kubernetes version: v1.30.0
start args: --memory 6g --cpus=2 --addons ingress
github token: ${{ inputs.github-token }}
- name: Create a Kind cluster
run: make kind-create-cluster

- name: Build local spark-operator docker image for minikube testing
- name: Build and load image to Kind cluster
run: |
docker build -t docker.io/kubeflow/spark-operator:local .
minikube image load docker.io/kubeflow/spark-operator:local
# The integration tests are currently broken see: https://github.com/kubeflow/spark-operator/issues/1416
# - name: Run chart-testing (integration test)
# run: make integration-test
make kind-load-image IMAGE_TAG=local
- name: Setup tmate session
if: failure()
uses: mxschmitt/action-tmate@v3
timeout-minutes: 15
- name: Run e2e tests
run: make e2e-test
44 changes: 0 additions & 44 deletions .github/workflows/push-tag.yaml

This file was deleted.

120 changes: 0 additions & 120 deletions .github/workflows/release-docker.yaml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,25 @@ name: Release Helm charts

on:
release:
types: [published]
types:
- published

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
HELM_REGISTRY: ghcr.io
HELM_REPOSITORY: ${{ github.repository_owner }}/helm-charts

jobs:
build:
release_helm_charts:
permissions:
contents: write
packages: write

runs-on: ubuntu-latest

steps:
- name: Checkout source code
uses: actions/checkout@v4
Expand All @@ -27,10 +35,28 @@ jobs:
with:
version: v3.14.4

- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ${{ env.HELM_REGISTRY }}
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Read version from VERSION file
run: |
VERSION=$(cat VERSION)
echo "VERSION=${VERSION}" >> $GITHUB_ENV
- name: Package Helm charts
run: |
for chart in $(ls charts); do
helm package charts/$chart
helm package charts/${chart}
done
- name: Upload charts to GHCR
run: |
for pkg in $(ls *.tgz); do
helm push ${pkg} oci://${{ env.HELM_REGISTRY }}/${{ env.HELM_REPOSITORY }}
done
- name: Save packaged charts to temp directory
Expand All @@ -44,15 +70,15 @@ jobs:
ref: gh-pages
fetch-depth: 0

- name: Copy packages charts
- name: Copy packaged charts
run: |
cp /tmp/charts/*.tgz .
- name: Update Helm charts repo index
env:
CHART_URL: https://github.com/${{ github.repository }}/releases/download/${{ github.ref_name }}
run: |
helm repo index --merge index.yaml --url $CHART_URL .
helm repo index --merge index.yaml --url ${CHART_URL} .
git add index.yaml
git commit -s -m "Update index.yaml" || exit 0
git commit -s -m "Add index for Spark operator chart ${VERSION}" || exit 0
git push
Loading

0 comments on commit fab1c46

Please sign in to comment.