Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command-line interface redesign #291

Merged
merged 7 commits into from
Aug 24, 2017

Conversation

Yancey1989
Copy link
Collaborator

Fixed #285

`pfs [command] [path1|path2..]`

- `ls`: `pfs ls <path>` will list all folders and files under the path.
- `put`: `pfs put <src> <desc>` will upload the local folder or file to the cloud path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能用put,还是得用copysrc dest 将来可能都是云端路径。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉putget会更方便理解一点。cp可以只用做云端路径之间的拷贝。和ftp的命令类似。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同感,感觉后续可以同时支持cpput, 用于云端拷贝和上传文件。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

`pfs [command] [path1|path2..]`

- `ls`: `pfs ls <path>` will list all folders and files under the path.
- `put`: `pfs put <src> <desc>` will upload the local folder or file to the cloud path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉putget会更方便理解一点。cp可以只用做云端路径之间的拷贝。和ftp的命令类似。

- `paddle submit job -name <job-name> -cpu 1 ...`
- `paddle get [job|worker|quota]`
- `paddle describe job <job name>`, `paddle describe worker <worker name>`
- `paddle delete job <job name>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be paddle delete <job name>

Why use delete instead of kill? It seems there are no other things to delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只是觉得Kill一个 Resource,貌似不是很容易被理解,不过确实PaddleCloud上没有其他Resource是需要被kill的,先改回kill了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `name`: You can specify a name for the given resource
- `flags`: You can specify any flags override default values for `command` or `resource`

- Command
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command => Sub-Commond

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@wanghaoshuang wanghaoshuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

来自忠实用户的建议。

- `submit`: submit the specify resource, we only support submit a PaddleJob for now
- `get`: get all instances for the specified resource
- `describe`: output the description for the specify resource
- `delete`: delete the specific resource, we only support delete a PaddleJob for now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原来的kill是不是更简短好用?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. delete => kill

- Command
- `submit`: submit the specify resource, we only support submit a PaddleJob for now
- `get`: get all instances for the specified resource
- `describe`: output the description for the specify resource

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉describe 有点长...用info合适么?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可是info是一个名词的缩写,貌似不太符合我们设想的语法规范。


- Command
- `submit`: submit the specify resource, we only support submit a PaddleJob for now
- `get`: get all instances for the specified resource

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get可不可以有默认选项,比如paddlecloud get == paddlecloud get jobs ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可是怎么确定哪个Resource是默认选项呢?worker还是job

## Reference
## PaddleCloud command-line design
You can use the following syntax to run `paddlecloud` commands from your terminal window:
`paddlecloud [command] [resource] [name] [flags]`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我一般会把paddlecloud alias 成pcloud...

Copy link
Collaborator Author

@Yancey1989 Yancey1989 Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddlecloud 确实比较长,我同意改成pcloud. +1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@helinwang helinwang Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yancey1989 是不是需要改一下paddlecloud代码目录和改成pcloud,以及确认下发布的binary以后都是pcloud了。
Edit:哦,这个是design doc,可以等merge了之后再改。

- `quota`, The quota for the current user

- Example
- `paddle submit job -name <job-name> -cpu 1 ...`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle->paddlecloud?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `ls`: will list all folders and files under the path.
- `put`: will upload the local folder or file to the cloud path.
- `get`: will download the foler or file to the local path.
- `rm`: will remove the specify folder or file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need cp command

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


- Example
- `pcloud submit job -name <job-name> -cpu 1 ...`
- `pcloud get [job|worker|quota]`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember as discussed yesterday, pcloud list is more simple than pcloud get jobs, and pcloud trainers will get pods running trainers, pcloud pservers will get pserver pods.

Just to discuss, which will be better for users?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more, can we change the name "workers" to "trainers", and add "pservers" commnads?

Copy link
Collaborator Author

@Yancey1989 Yancey1989 Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to separate worker into trainer and pserver, we also need master.But i think using workers to express pserver, trainer and master is good for simplified users' operation, users can use a single command to view all the workers state:

pcloud workers ...

rather than three commands:

pcloud trainers ...
pcloud servers ...
pcloud master ...

- `pcloud get [job|worker|quota]`
- `pcloud describe job <job name>`, `paddle describe worker <worker name>`
- `pcloud kill <job name>`
- `pcloud logs job <job name>`, `paddle logs worker <worker name>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, pcloud logs <trainer or pserver name>, logs command will only used for getting logs for workers since logs are stored for each "pod".

If we want to aggregate logs, add a new command like pcloud logaggr <job name>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @typhoonzero 's comment!

- `ls`: will list all folders and files under the path.
- `put`: will upload the local folder or file to the cloud path.
- `get`: will download the foler or file to the local path.
- `rm`: will remove the specify folder or file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need mv command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `quota`, The quota for the current user

- Example
- `pcloud submit job -name <job-name> -cpu 1 ...`
Copy link
Collaborator

@helinwang helinwang Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得pcloud最核心的是两件事:
job和pfs。
pfs我觉得挺好的(这些命令经常敲,写成pcloud pfs xx可能太长):

pcloud ls
pcloud put
pcloud get
pcloud rm
pcloud cp
pcloud mv

job相关的是否分得太开了:

pcloud get job
pcloud describe job
pcloud submit job
pcloud kill
pcloud logs

能不能统一到一个subcommand里面:

pcloud job help
pcloud job list
pcloud job submit
pcloud job kill
pcloud job log
pcloud job info

剩下的还有:

pcloud resource worker # 这个需要吗,用户会不会在意每个worker的属性(比如IP是多少)
pcloud resource quota # 另一种方式是pcloud quota
pcloud describe worker # 这个是否跟pcloud resource worker重复
pcloud describe quota # 是否跟pcloud resource quota重复
pcloud get quota # 是否跟pcloud resource quota重复
pcloud get worker # 用户是否在意?
pcloud logs worker # 用户是否在意?跟job log有重复的地方

Copy link
Collaborator Author

@Yancey1989 Yancey1989 Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pcloud get worker # 用户是否在意?

用户可以用这个命令看到trainer/pserer/master进程的状态,应该也是需要的。

pcloud logs worker # 用户是否在意?跟job log有重复的地方

现在pcloud logs job确实有些不合适的地方:

  1. worker很多的时候日志聚合性能很差(需要在Server端对每个Pod调用一次API)
  2. 多个worker日志聚合一起看体验也不是很有好, 可以通过pcloud worker list得到worker的列表,根据需要查看具体worker的日志pcloud logs <worker-name>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞思考!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job相关的是否分得太开了:
...
能不能统一到一个subcommand里面:
pcloud job help
pcloud job list
pcloud job submit
pcloud job kill
pcloud job log
pcloud job info

刚和 @typhoonzero 也线下讨论了一波,感觉是不是这样也会太长了,毕竟resource比较少,有些像kill/submit等也只会针对job,是不是可以改成:

  • Job Operation
pcloud jobs # list job
pcloud kill <job-name> # kill job 
pcloud jobs -v <job-name> # job info
  • Pod Operation (获取直接用Kubernetes中的Pod来表示正在运行实例会比较方便,不需要引入Paddle的Worker概念了,用Pod的名字来区分是trainer/pserver/master进程)
pcloud pods # list all pod, for the current user
pcloud logs <pod-name># logs for pod
pcloud pods -v <pod-name>  # info for pod

PFS的部分的确简化一些会比较方便,我也赞同简短一些,多谢 @helinwang .

Copy link
Collaborator

@helinwang helinwang Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yancey1989 赞!LGTM!能否更新下这个PR,然大家都reivew一下?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当然,我马上更新下,多谢 @helinwang

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- `rm <dest>`: remove the specify folder or file on cloud.
- `mv <src> <dest>`: move a folder or a file from `<src>`
to `<dest>`, `<src>` and `<dest>` only support cloud path.
- `cp <src> <dest`: copy a folder or a file from `<src>` to `<dest>`, `<src>`
Copy link
Collaborator

@gongweibao gongweibao Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp这个命令有点复杂,对于比较大的文件copy的过程估计会很长,超时和用户的干预(如ctrl + c)是一个问题。
我的基本的想法是cp变成一个job提交的到k8s,然后提示用户这个cp的任务的名字,让用户当做job来管理。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另一个方法是不支持cp,只支持mv?从ceph mv到同一个ceph文件系统应该很快?

Copy link
Collaborator

@gongweibao gongweibao Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mv是元数据的处理应该很快,需要测试一下。
前两天星海还给我们提需求,支持server端的copy。没有这个可能会让用户感到非常的意外。
我觉得还是需要支持的。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也有同感可能cp没有必要:如果是需要对这个数据预处理,生成一份新的数据,则需要数据预处理的job管理的功能;如果是移动文件位置的确mv足够了。

cp一份数据,这样存在多份相同的数据的需求的原始原因是什么呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我的基本的想法是cp变成一个job提交的到k8s,然后提示用户这个cp的任务的名字,让用户当做job来管理。

启动一个Job来copy文件会不会过于复杂了,可不可以这样:

  1. Server端收到copy的请求后创建一个GoRoutine来copy文件,同时在目录下touch一个文件并记录copy的状态(RUNNING,DONE,FAILED)
  2. 客户端调用API轮训的去check文件的状态

Copy link
Collaborator

@gongweibao gongweibao Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也有同感可能cp没有必要:如果是需要对这个数据预处理,生成一份新的数据,则需要数据预处理的job管理的功能;如果是移动文件位置的确mv足够了。
cp一份数据,这样存在多份相同的数据的需求的原始原因是什么呢?

机房之间的数据拷贝。我们以前使用的是HDFS有一个RemoteCopy的命令,把机房1的数据拷贝到机房2上。同机房的用的很少。
这个的实现看上去不紧急。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typhoonzero 多个人同时使用一份数据做实验,如果不做拷贝的话,是件挺危险的事情,其中一个人对数据做了更改,将无意中影响其他人,虽说场景有限,但并非无用;如果权限管理完善,限制只读操作,倒是一定程度上避免这种问题

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkuyym 明白了。我理解是否有一个chmod命令可以更改成只读权限能否满足呢?

Copy link
Collaborator

@helinwang helinwang Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢@pkuyym提供的使用场景。
我们现在是有一个Home folder,一个公共folder,看来是会需要一个team folder。能否只有上传人有写的权限,team其他人只有读的权限。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我先移除了CP命令,我们需要增加一个team folder, 相关的讨论见: #325 , #328

`pcloud pfs [ls|put|get|rm...] [path1|path2...]`

- `ls <dest>`: list all folders and files under the destination path.
- `put <src> <dest>`: upload the local folder or file to the cloud path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have multiple <src>, for put get mv

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE.

@helinwang
Copy link
Collaborator

Looking good!

Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yancey1989 Yancey1989 merged commit 66d64b0 into PaddlePaddle:develop Aug 24, 2017
@Yancey1989 Yancey1989 deleted the commandline_design_doc branch August 24, 2017 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants