Skip to content

Commit

Permalink
#251 remove custom spark operator compilation
Browse files Browse the repository at this point in the history
 - remove the custom compilation of Spark Operator
 - update the Spark Operator chart dependency to the latest version
 - modify the Spark Operator chart to work with the stock kubeflow image
   - remove custom RBAC
   - change ivy cache mount location

Note that the only thing preventing the complete removal of our custom
Spark Operator image is the fact that kubeflow's Spark Operator is on
Spark 3.5 while we're on Spark 3.4.  The upgrade is actually pretty easy
without many breaking changes at all, however it started to require some
dependency untangling with different versions being pulled in between
Spark 3.5 and Quarkus 2.8.  Since we know we'll be upgrading Quarkus
soon, I'm leaving the Spark 3.5 upgrade to follow that.  This should cut
a significant amount of time off of the image build though, as the
custom Go compilation was taking at least half of the build time.
  • Loading branch information
ewilkins-csi committed Aug 1, 2024
1 parent b97a37e commit 8ff8bc9
Show file tree
Hide file tree
Showing 10 changed files with 13 additions and 395 deletions.
1 change: 0 additions & 1 deletion build-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
<version.assembly.plugin>3.6.0</version.assembly.plugin>
<version.buildhelper.plugin>3.2.0</version.buildhelper.plugin>
<version.buildnumber.plugin>3.1.0</version.buildnumber.plugin>
<version.cucumber.reporting.plugin>5.7.5</version.cucumber.reporting.plugin>
<version.failsafe.plugin>${version.maven.surefire.plugin}</version.failsafe.plugin>
<version.fermenter>2.10.3</version.fermenter>
<version.fermenter.legacy.tools>2.8.0</version.fermenter.legacy.tools>
Expand Down
1 change: 0 additions & 1 deletion build-support/aissemble-enforcer-extension/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
<version.maven.core>3.8.6</version.maven.core>
<version.junit>4.13.2</version.junit>
<version.cucumber>6.10.4</version.cucumber>
<version.cucumber.reporting.plugin>5.7.5</version.cucumber.reporting.plugin>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,21 @@
ARG DOCKER_BASELINE_REPO_ID
ARG VERSION_AISSEMBLE

FROM golang:1.22.2-alpine as builder

WORKDIR /workspace

# Copy the Go Modules manifests
COPY ./target/checkout/go* ./
# Cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download

# Copy the go source code
COPY ./target/checkout/main.go main.go
COPY ./target/checkout/pkg/ pkg/
COPY ./target/checkout/hack/gencerts.sh /tmp/scripts/gencerts.sh
COPY ./target/checkout/entrypoint.sh /tmp/scripts/entrypoint.sh

RUN go mod tidy
# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o /usr/bin/spark-operator main.go
FROM kubeflow/spark-operator:v1beta2-1.6.2-3.5.0 AS builder

# We would be able to use the kubeflow image directly, except that it is on Spark 3.5 instead of 3.4
FROM ${DOCKER_BASELINE_REPO_ID}boozallen/aissemble-spark:${VERSION_AISSEMBLE}

LABEL org.opencontainers.image.source="https://github.com/boozallen/aissemble"

USER root

COPY --from=builder /usr/bin/spark-operator /usr/bin/
COPY --from=builder /tmp/scripts/* /usr/bin/
RUN apt-get update --allow-releaseinfo-change \
&& apt-get update \
&& apt-get install -y openssl curl tini \
&& rm -rf /var/lib/apt/lists/* \
&& chmod +x /usr/bin/entrypoint.sh \
&& chmod +x /usr/bin/gencerts.sh
RUN apt-get update \
&& apt-get install -y tini \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder --chmod=755 /usr/bin/spark-operator /usr/bin/
COPY --from=builder --chmod=755 /usr/bin/entrypoint.sh /usr/bin/

USER spark
ENTRYPOINT ["/usr/bin/entrypoint.sh"]
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ sources:
- https://github.com/boozallen/aissemble
dependencies:
- name: spark-operator
version: 1.1.27
version: 1.4.6
repository: https://kubeflow.github.io/spark-operator/
import-values:
- child: batchScheduler
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,25 +45,8 @@ aissemble-spark-operator-chart:
| volumes | Volumes for the pod | No | `spark-logging=/tmp/spark-logging` |
| volumeMounts | Volume Mounts for the pod | No | `spark-logging=/tmp/spark-logging` |
| fullnameOverride | String to override release name | No | spark-operator |
| rbac.createClusterRole | See `Migrated Properties` | No | false |
| serviceAccounts.spark.name | Name for the spark service account | No | spark |


## Migrated Properties
The following properties have been migrated from the `spark-operator` subchart to the `aissemble-spark-operator-chart` chart.
Any required overrides should be cognisant of the alternate path. For example:

```yaml
aissemble-spark-operator-chart:
rbac:
createClusterRole: false
```

| Property | Description | Default |
|------------------------|-------------------------------------------------------------------------------|---------|
| rbac.createClusterRole | Create and use RBAC `ClusterRole` resources. Migrated to use modified rules. | true |


# Shared Ivy Cache

Spark uses [Ivy](https://ant.apache.org/ivy/) to resolve and download dependencies for Spark applications. By default,
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -138,4 +138,4 @@ tests:
# path: spec.volumeMounts
# content:
# name: ivy-cache
# mountPath: /opt/spark/.ivy2
# mountPath: /home/spark/.ivy2
Loading

0 comments on commit 8ff8bc9

Please sign in to comment.