Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to submit spark training job as driver service account is not configurable #1111

Closed
ChenYi015 opened this issue Jul 30, 2024 · 1 comment · Fixed by #1112
Closed

Comments

@ChenYi015
Copy link
Collaborator

I tried to submit Spark training job, refering Submit a distributed spark job - Arena Documentation.

arena submit sparkjob \
   --name=sparktest \
   --image=registry.aliyuncs.com/acs/spark-pi:ack-2.4.5-latest \
   --main-class=org.apache.spark.examples.SparkPi \
   --jar=local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar

And it fails with error message related to ServiceAccount:

Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods \"sparktest-driver\" is forbidden: error looking up service account default/spark: serviceaccount \"spark\" not found.

And there is no flags in spark submit spark used to configure ServiceAccount:

$ bin/arena submit spark --help
Submit a common spark application job.

Usage:
  arena submit sparkjob [flags]

Aliases:
  sparkjob, spark

Flags:
  -a, --annotation stringArray           the annotations, usage: "--annotation=key=value" or "--annotation key=value"
      --driver-cpu-request int           cpu request for driver pod (default 1)
      --driver-memory-request string     memory request for driver pod (min is 500m) (default "500m")
      --executor-cpu-request int         cpu request for executor pod (default 1)
      --executor-memory-request string   memory request for executor pod (min is 500m) (default "500m")
  -h, --help                             help for sparkjob
      --image string                     the docker image name of training job (default "registry.aliyuncs.com/acs/spark:v2.4.0")
      --jar string                       jar path in image (default "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar")
  -l, --label stringArray                specify the label
      --main-class string                main class of your jar (default "org.apache.spark.examples.SparkPi")
      --name string                      override name
      --replicas int                     the executor's number to run the distributed training. (default 1)
@Syulin7
Copy link
Collaborator

Syulin7 commented Jul 30, 2024

@ChenYi015 Thanks for your grate contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants