Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Switch to official Spark Operator release #251

Closed
2 tasks done
ewilkins-csi opened this issue Jul 31, 2024 · 2 comments
Closed
2 tasks done

Feature: Switch to official Spark Operator release #251

ewilkins-csi opened this issue Jul 31, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ewilkins-csi
Copy link
Contributor

ewilkins-csi commented Jul 31, 2024

Description

There was a bug in Spark Operator that we had to manually fix with a custom fork/compilation of Spark Operator. That change bug fix was finally merged in on May 17. We should upgrade to the version with our fix and remove the custom image if possible. It might also make sense to jump to the latest version of the Spark Operator chart while we're at it.

DOD

  • aiSSEMBLE no longer clones and compiles the Tech Brewery fork of Spark Operator
  • Dynamic allocation of executors still works as expected

Test Script

  1. Generate a new project
mvn archetype:generate '-DarchetypeGroupId=com.boozallen.aissemble' \
                       '-DarchetypeArtifactId=foundation-archetype' \
                       '-DarchetypeVersion=1.8.0-SNAPSHOT' \
                       '-DartifactId=test-251' \
                       '-DgroupId=org.test' \
                       '-Dpackage=org.test' \
                       '-DprojectGitUrl=test.org/test-251.git' \
                       '-DprojectName=Test Project' -B && cd test-251
  1. Add the attached PysparkPipeline.json to test-251-pipeline-models/src/main/resources/pipelines
  2. Build the project and follow all manual actions, repeating until no more actions are printed
mvn clean install
  1. Do a full build of the project and complete any new manual actions
mvn clean install -Dmaven.build.cache.skipCache
  1. OTS Only Switch the spark-operator chart to use the locally modified chart
    1. Add upgrade-v2-chart-files-aissemble-version-migration to the disabled migrations in the root pom.xml
    2. Update test-251-deploy/src/main/resources/apps/spark-operator/Chart.yaml to point to the local chart version 1.0.0
    3. Update test-251-deploy/src/main/resources/apps/spark-operator/values.yaml to set aissemble-spark-operator-chart.spark-operator.image.pullPolicy to Never
    4. Rebuild the deploy module: mvn clean install -pl :test-251-deploy
  2. Deploy the project
tilt up
  1. Execute the pyspark-pipeline resource in Tilt
  2. Verify the pipeline completes successfully with no exceptions from the CoarseGrainedExecutor class
@ewilkins-csi ewilkins-csi added the enhancement New feature or request label Jul 31, 2024
@ewilkins-csi ewilkins-csi added this to the 1.8.0 milestone Jul 31, 2024
@ewilkins-csi ewilkins-csi changed the title Feature: Remove custom Spark Operator image Feature: Switch to official Spark Operator release Aug 1, 2024
@carter-cundiff
Copy link
Contributor

OTS passed

ewilkins-csi added a commit that referenced this issue Aug 1, 2024
 - remove the custom compilation of Spark Operator
 - update the Spark Operator chart dependency to the latest version
 - modify the Spark Operator chart to work with the stock kubeflow image
   - remove custom RBAC
   - change ivy cache mount location

Note that the only thing preventing the complete removal of our custom
Spark Operator image is the fact that kubeflow's Spark Operator is on
Spark 3.5 while we're on Spark 3.4.  The upgrade is actually pretty easy
without many breaking changes at all, however it started to require some
dependency untangling with different versions being pulled in between
Spark 3.5 and Quarkus 2.8.  Since we know we'll be upgrading Quarkus
soon, I'm leaving the Spark 3.5 upgrade to follow that.  This should cut
a significant amount of time off of the image build though, as the
custom Go compilation was taking at least half of the build time.
ewilkins-csi added a commit that referenced this issue Aug 1, 2024
 - remove the custom compilation of Spark Operator
 - update the Spark Operator chart dependency to the latest version
 - modify the Spark Operator chart to work with the stock kubeflow image
   - remove custom RBAC
   - change ivy cache mount location

Note that the only thing preventing the complete removal of our custom
Spark Operator image is the fact that kubeflow's Spark Operator is on
Spark 3.5 while we're on Spark 3.4.  The upgrade is actually pretty easy
without many breaking changes at all, however it started to require some
dependency untangling with different versions being pulled in between
Spark 3.5 and Quarkus 2.8.  Since we know we'll be upgrading Quarkus
soon, I'm leaving the Spark 3.5 upgrade to follow that.  This should cut
a significant amount of time off of the image build though, as the
custom Go compilation was taking at least half of the build time.
ewilkins-csi added a commit that referenced this issue Aug 1, 2024
 - remove the custom compilation of Spark Operator
 - update the Spark Operator chart dependency to the latest version
 - modify the Spark Operator chart to work with the stock kubeflow image
   - remove custom RBAC
   - change ivy cache mount location

Note that the only thing preventing the complete removal of our custom
Spark Operator image is the fact that kubeflow's Spark Operator is on
Spark 3.5 while we're on Spark 3.4.  The upgrade is actually pretty easy
without many breaking changes at all, however it started to require some
dependency untangling with different versions being pulled in between
Spark 3.5 and Quarkus 2.8.  Since we know we'll be upgrading Quarkus
soon, I'm leaving the Spark 3.5 upgrade to follow that.  This should cut
a significant amount of time off of the image build though, as the
custom Go compilation was taking at least half of the build time.
ewilkins-csi added a commit that referenced this issue Aug 1, 2024
@ewilkins-csi ewilkins-csi self-assigned this Aug 1, 2024
@csun-cpointe
Copy link
Contributor

test passed!!
Pipeline completes successfully with no exceptions from the CoarseGrainedExecutor class
Screenshot 2024-08-01 at 10 16 35 AM
Screenshot 2024-08-01 at 10 16 43 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants