You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
job name = ps
task index = 0
Traceback (most recent call last):
File "/var/tf_dist_mnist/dist_mnist.py", line 303, in <module>
tf.app.run()
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/var/tf_dist_mnist/dist_mnist.py", line 144, in main
cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index)
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/training/server_lib.py", line 147, in __init__
self._server_def.SerializeToString(), status)
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not parse port for local server from ""
in worker 1:
job name = ps
task index = 1
Traceback (most recent call last):
File "/var/tf_dist_mnist/dist_mnist.py", line 303, in <module>
tf.app.run()
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/var/tf_dist_mnist/dist_mnist.py", line 144, in main
cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index)
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/training/server_lib.py", line 147, in __init__
self._server_def.SerializeToString(), status)
File "/opt/conda/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Task 1 was not defined in job "ps"
What you expected to happen:
the dome in docs can run success
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
Volcano Version: install by master branch,image with latest tag
Kubernetes version (use kubectl version): 1.18.8
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): CentOS-7
Kernel (e.g. uname -a): Linux emr-header-1.cluster-337861 3.10.0-1160.42.2.el7.x86_64 Rename hpw.cloud keyword to volcano.sh #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubectl apply -f
Others:
The text was updated successfully, but these errors were encountered:
What happened:
run the demo in https://volcano.sh/en/docs/tf_on_volcano/
got error
in ps and worker 0
in worker 1:
What you expected to happen:
the dome in docs can run success
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): 1.18.8uname -a
): Linux emr-header-1.cluster-337861 3.10.0-1160.42.2.el7.x86_64 Rename hpw.cloud keyword to volcano.sh #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: