Skip to content

Commit

Permalink
Merge branch 'add_example' into 'master'
Browse files Browse the repository at this point in the history
[Issue volcano-sh#49] 添加 openmpi 及 tensorflow 的 job样例

![image](/uploads/132c7cb1612d13b6f6ab40855eee77f1/image.png)

![image](/uploads/3b09ea556e1acabfe79e59ec15af08a2/image.png)

Issues info:
Issue ID: 49
Title: 添加 openmpi 及 tensorflow 的 job样例
Issue url: CBU-PaaS/Community/volcano/volcano#49


See merge request CBU-PaaS/Community/volcano/volcano!81
  • Loading branch information
mada 00483107 committed Apr 1, 2019
2 parents 549d7e7 + 8a413fa commit 30dcab8
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 0 deletions.
55 changes: 55 additions & 0 deletions example/openmpi-hello.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: openmpi-hello
spec:
minAvailable: 3
schedulerName: kube-batch
plugins:
ssh: []
env: []
tasks:
- replicas: 1
name: mpimaster
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
imagePullSecrets:
- name: default-secret
containers:
- command:
- /bin/sh
- -c
- |
MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`;
mkdir -p /var/run/sshd; /usr/sbin/sshd;
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world > /home/re
image: 100.125.5.235:20202/l00427178/openmpi-hello:3.28
name: mpimaster
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
restartPolicy: OnFailure
- replicas: 2
name: mpiworker
template:
spec:
imagePullSecrets:
- name: default-secret
containers:
- command:
- /bin/sh
- -c
- |
mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
image: 100.125.5.235:20202/l00427178/openmpi-hello:3.28
name: mpiworker
ports:
- containerPort: 22
name: mpijob-port
workingDir: /home
restartPolicy: OnFailure

62 changes: 62 additions & 0 deletions example/tensorflow-benchmark.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: tensorflow-benchmark
spec:
minAvailable: 5
schedulerName: kube-batch
plugins:
env: []
policies:
- event: PodEvicted
action: RestartJob
- event: PodFailed
action: RestartTask
tasks:
- replicas: 2
name: ps
template:
spec:
imagePullSecrets:
- name: default-secret
containers:
- command:
- sh
- -c
- |
PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | tr "\n" ","`;
WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | tr "\n" ","`;
python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=1 --local_parameter_device=cpu --device=cpu --data_format=NHWC --job_name=ps --task_index=${VK_TASK_INDEX} --ps_hosts=${PS_HOST} --worker_hosts=${WORKER_HOST}
image: 100.125.5.235:20202/l00427178/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
name: tensorflow
ports:
- containerPort: 2222
name: tfjob-port
resources: {}
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
- replicas: 3
name: worker
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
imagePullSecrets:
- name: default-secret
containers:
- command:
- sh
- -c
- |
PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | tr "\n" ","`;
WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | tr "\n" ","`;
python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=1 --local_parameter_device=cpu --device=cpu --data_format=NHWC --job_name=worker --task_index=${VK_TASK_INDEX} --ps_hosts=${PS_HOST} --worker_hosts=${WORKER_HOST}
image: 100.125.5.235:20202/l00427178/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
name: tensorflow
ports:
- containerPort: 2222
name: tfjob-port
resources: {}
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure

0 comments on commit 30dcab8

Please sign in to comment.