-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New option for the specification of hostfile for sos run and sos execute #1279
Comments
We could reuse
For the last usage, we do not have to say it is a file, rather the use of |
This interface reads intuitive. Not sure if I understand 2: when running remote tasks, |
This interface mimics the execution model of SCOOP, namely the non-cluster multi-node execution of workflows. It is added because PBS systems generate node files (albeit different formats) to specify the nodes for the execution of things on cluster, and That means there is no need to differentiate cluster and non-cluster multi-node execution, and we can say
In all cases, the first node should be the "master" when the master process will be executed. Now, this option will be used by both
to execute the entire workflow on multiple nodes. We can also put this in a PBS system, then the syntax would mostly like
when The
to execute single multi-node task, or single multi-task master task. |
#1278
It appears clear that a hostfile is needed for multi-node execution. Although a host file can be automatically generated by PBS systems, and be picked up automatically by commands such as
sos execute
andsos run
, it is necessary to allow this option so that users can specify it manually to allow multi-node execution of workflows and tasks.This option should work like this:
--hostfile
option of SCOOP, with a similar or identical format. The workers will be created on these hosts.The problem is that
sos run
does not support--
options so we will have to reuse an existing option or find another option.Once this option is specified, users can use
to run work flow on multiple hosts.
Use
to run entire workflow on a cluster system.
The same mechanism will be used for the execution of tasks, something like
The text was updated successfully, but these errors were encountered: