From cb851a5ec6825843afec365c5c31d5432c70c565 Mon Sep 17 00:00:00 2001 From: Yifan Xiong Date: Fri, 20 Mar 2020 17:05:42 +0800 Subject: [PATCH] Migrate protocol spec (#4307) * Migrate protocol spec Migrate pai protocol spec to https://github.com/microsoft/openpai-protocol. * Fix broken links Fix broken links. --- .../documentation/edit_yaml_job_config.md | 18 +- .../pai_vscode/documentation/submit_job.md | 20 +- contrib/python-sdk/README.md | 2 +- contrib/python-sdk/README_zh_CN.md | 2 +- .../docs/scenarios-and-user-stories.md | 2 +- .../docs/scenarios-and-user-stories_zh_CN.md | 2 +- contrib/python-sdk/openpaisdk/job.py | 4 +- contrib/submit-job-v2/README.md | 2 +- docs/pai-job-protocol.yaml | 247 ------------------ docs/system_architecture.md | 2 +- docs/user/job_submission.md | 4 +- docs/zh_CN/rest-server/API.md | 42 +-- docs/zh_CN/user/job_submission.md | 4 +- .../components/submission-section.jsx | 2 +- 14 files changed, 53 insertions(+), 300 deletions(-) delete mode 100644 docs/pai-job-protocol.yaml diff --git a/contrib/pai_vscode/documentation/edit_yaml_job_config.md b/contrib/pai_vscode/documentation/edit_yaml_job_config.md index 876ec43788..455bb8b0dd 100644 --- a/contrib/pai_vscode/documentation/edit_yaml_job_config.md +++ b/contrib/pai_vscode/documentation/edit_yaml_job_config.md @@ -1,8 +1,8 @@ # OpenPAI job config file edit features -In OpenPAI, all jobs are represented by YAML, a markup language. -Base on VSCode editor [IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) and [YAML extension](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml), OpenPAI VS Code Client support some features to improve user experience for editing job config file. -For more details about the protocol of OpenPAI job, please refer to [PAI Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml). +In OpenPAI, all jobs are represented by YAML, a markup language. +Base on VSCode editor [IntelliSense](https://code.visualstudio.com/docs/editor/intellisense) and [YAML extension](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml), OpenPAI VS Code Client support some features to improve user experience for editing job config file. +For more details about the protocol of OpenPAI job, please refer to [PAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). - [OpenPAI job config file edit features](#openpai-job-config-file-edit-features) - [Create job config file](#create-job-config-file) @@ -45,7 +45,7 @@ User can create a simple job config YAML file by below ways: ## YAML validation -Use [PAI Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml) to do the validation for job config file. +Use [PAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml) to do the validation for job config file. ### Whitch YAML file is a PAI job config @@ -74,7 +74,7 @@ Base on VSCode editor [IntelliSense](https://code.visualstudio.com/docs/editor/i ### 2. Code snippets -We provide several code snippets for VSCode YAML editor, user can use it to form their job config easily. +We provide several code snippets for VSCode YAML editor, user can use it to form their job config easily. It could be trigger by typing or right click in the editor and select `OpenPAI: Insert job config` #### Trigger by typing @@ -87,7 +87,7 @@ It could be trigger by typing or right click in the editor and select `OpenPAI: ## Insert OpenPAI Runtime Plugin -OpenPAI support some runtime plugins, such as SSH plugin, Storage plugin and Tensorboard plugin, user can config it in their job and setup the service. +OpenPAI support some runtime plugins, such as SSH plugin, Storage plugin and Tensorboard plugin, user can config it in their job and setup the service. Here is an example of runtime plugin config: ```yaml @@ -111,7 +111,7 @@ We provide several ways to help user insert the plugin config in YAML file. ### Insert by code snippet -The snippet `OpenPAI Runtime Plugin` will include `"com.microsoft.pai.runtimeplugin:"` line, and will ask user to select the first plugin type and generate it. +The snippet `OpenPAI Runtime Plugin` will include `"com.microsoft.pai.runtimeplugin:"` line, and will ask user to select the first plugin type and generate it. Typing `"- plugin:"` will trigger `OpenPAI: Insert a runtime plugin config`, it will help user to add other plugins. ![Insert by code snippet](../assets/auto_completion_runtime_plugin_snippet.gif) @@ -124,6 +124,6 @@ Right click the editor and select `OpenPAI: Insert job config`, and select `Open ## Reference -[PAI Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml) -[Submit Jobs on OpenPAI](https://github.com/microsoft/pai/blob/master/docs/user/job_submission.md#job-workflow) +[PAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml) +[Submit Jobs on OpenPAI](https://github.com/microsoft/pai/blob/master/docs/user/job_submission.md#job-workflow) [Troubleshoot jobs](https://github.com/microsoft/pai/blob/master/docs/user/troubleshooting_job.md) diff --git a/contrib/pai_vscode/documentation/submit_job.md b/contrib/pai_vscode/documentation/submit_job.md index 386bb2ede2..0ab430a9ae 100644 --- a/contrib/pai_vscode/documentation/submit_job.md +++ b/contrib/pai_vscode/documentation/submit_job.md @@ -1,6 +1,6 @@ # Submit job to OpenPAI by VSCode Extension -This document is a tutorial for OpenPAI job submission on VSCode Extension. +This document is a tutorial for OpenPAI job submission on VSCode Extension. Before learning this document, make sure you have an OpenPAI cluster already, and already install VSCode. - [Submit job to OpenPAI by VSCode Extension](#submit-job-to-openpai-by-vscode-extension) @@ -33,7 +33,7 @@ In VSCode Extension Marketplace search [OpenPAI VS Code Client](https://marketpl ## Submit a Hello World Job -The job of OpenPAI defines how to execute code(s) and command(s) in specified environment(s). A job can be run on single node or distributedly. +The job of OpenPAI defines how to execute code(s) and command(s) in specified environment(s). A job can be run on single node or distributedly. The following process submits a model training job implemented by TensorFlow on CIFAR-10 dataset. It downloads data and code from internet and helps getting started with OpenPAI. ### Create a job config file @@ -77,12 +77,12 @@ taskRoles: - python train_image_classifier.py --dataset_name=cifar10 --dataset_dir=/tmp/data --max_number_of_steps=1000 ``` -The `OpenPAI VS Code Client` support some features to improve user experience for editing job config file, please refer to [OpenPAI job config file edit features](edit_yaml_job_config.md). +The `OpenPAI VS Code Client` support some features to improve user experience for editing job config file, please refer to [OpenPAI job config file edit features](edit_yaml_job_config.md). To learn more about this job, please refer to [Learn the Hello World Job](https://github.com/microsoft/pai/blob/master/docs/user/job_submission.md#Learn-the-Hello-World-Job) ### Submit it -Finish editing the config file, save it and right click on the editor and select `Submit Job to PAI Cluster`. +Finish editing the config file, save it and right click on the editor and select `Submit Job to PAI Cluster`. After the information `Successfully submitted job.` pop up, you can click the `Open job page` button at right bottom corner, and view you job on website. ![Submit](../assets/submit.gif) @@ -93,26 +93,26 @@ Most model training and other kinds of jobs need to transfer files between runni ### Teamwise Storage -OpenPAI admin can define Team-wise storage through [Storage Plugin](https://github.com/microsoft/pai/tree/master/contrib/storage_plugin). +OpenPAI admin can define Team-wise storage through [Storage Plugin](https://github.com/microsoft/pai/tree/master/contrib/storage_plugin). User's job container can mount to the storage if user add it in job config file, for how to insert storage plugin into job config, please refer to [Insert OpenPAI Runtime Plugin](edit_yaml_job_config.md#Insert-OpenPAI-Runtime-Plugin). ### Storage explorer -To manage user's data in Team-wise storage, `OpenPAI VS Code Client` support a `STORAGE EXPLORER` in vscode, User can manage data in the explorer. -We also support an `Auto Upload` feature in VSCode, the client will auto upload user's project file to the storage before submit job. +To manage user's data in Team-wise storage, `OpenPAI VS Code Client` support a `STORAGE EXPLORER` in vscode, User can manage data in the explorer. +We also support an `Auto Upload` feature in VSCode, the client will auto upload user's project file to the storage before submit job. For more detail, refer to [Storage Explorer and Auto Upload](storage_explorer_and_auto_upload.md) ![Storage Explorer](../assets/storage.gif) ### Source code auto upload -VSCode is a very powerful editor, user can use it to edit their source code, the auto upload feature make the source code to PAI job easily. +VSCode is a very powerful editor, user can use it to edit their source code, the auto upload feature make the source code to PAI job easily. For more detail, refer to [Storage Explorer and Auto Upload](storage_explorer_and_auto_upload.md#Auto-Upload) ![Source code auto upload](../assets/source_code_auto_upload.gif) ## Reference -[PAI Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml) -[Submit Jobs on OpenPAI](https://github.com/microsoft/pai/blob/master/docs/user/job_submission.md#job-workflow) +[PAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml) +[Submit Jobs on OpenPAI](https://github.com/microsoft/pai/blob/master/docs/user/job_submission.md#job-workflow) [Troubleshoot jobs](https://github.com/microsoft/pai/blob/master/docs/user/troubleshooting_job.md) diff --git a/contrib/python-sdk/README.md b/contrib/python-sdk/README.md index 19a2c5e69c..f377a7773a 100644 --- a/contrib/python-sdk/README.md +++ b/contrib/python-sdk/README.md @@ -228,7 +228,7 @@ opai job list -a [] ### How to submit a job from existing job config file -If you already has a job config file, you could submit a job based on it directly. The job config file could be in the format of `json` or `yaml`, and it must be compatible with [job configuration specification v1](https://github.com/microsoft/pai/blob/master/docs/job_tutorial.md) or [pai-job-protocol v2](https://github.com/Microsoft/pai/blob/master/docs/pai-job-protocol.yaml). +If you already has a job config file, you could submit a job based on it directly. The job config file could be in the format of `json` or `yaml`, and it must be compatible with [job configuration specification v1](https://github.com/microsoft/pai/blob/master/docs/job_tutorial.md) or [pai-job-protocol v2](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). ```bash opai job submit -a diff --git a/contrib/python-sdk/README_zh_CN.md b/contrib/python-sdk/README_zh_CN.md index f631b20c03..39cd5af54d 100644 --- a/contrib/python-sdk/README_zh_CN.md +++ b/contrib/python-sdk/README_zh_CN.md @@ -207,7 +207,7 @@ opai job list -a [] ### How to submit a job from existing job config file -If you already has a job config file, you could submit a job based on it directly. The job config file could be in the format of `json` or `yaml`, and it must be compatible with [job configuration specification v1](https://github.com/microsoft/pai/blob/master/docs/job_tutorial.md) or [pai-job-protocol v2](https://github.com/Microsoft/pai/blob/master/docs/pai-job-protocol.yaml). +If you already has a job config file, you could submit a job based on it directly. The job config file could be in the format of `json` or `yaml`, and it must be compatible with [job configuration specification v1](https://github.com/microsoft/pai/blob/master/docs/job_tutorial.md) or [pai-job-protocol v2](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). ```bash opai job submit -a diff --git a/contrib/python-sdk/docs/scenarios-and-user-stories.md b/contrib/python-sdk/docs/scenarios-and-user-stories.md index 71dcb66b5b..09d39c3458 100644 --- a/contrib/python-sdk/docs/scenarios-and-user-stories.md +++ b/contrib/python-sdk/docs/scenarios-and-user-stories.md @@ -4,7 +4,7 @@ - **User can easily access `OpenPAI` resources in scripts (`Python` or `Shell`) and `Jupyter` notebooks** -The SDK provides classes to describe the clusters (`openpaisdk.core.Cluster`) and jobs (`openpaisdk.job.Job`). The Cluster class wraps necessary REST apis for convenient operations. The Job class is an implementation of the [protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml), with which user can easily organize (add or edit) the content of job `yaml` and `json` configuration. +The SDK provides classes to describe the clusters (`openpaisdk.core.Cluster`) and jobs (`openpaisdk.job.Job`). The Cluster class wraps necessary REST apis for convenient operations. The Job class is an implementation of the [protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml), with which user can easily organize (add or edit) the content of job `yaml` and `json` configuration. Besides the wrapping of APIs, the SDK also provides functions to facilitate user to utilize `OpenPAI`. Such functions includes *cluster management*, *storage accessing*, *execution environment detection (local or in a job container)*. diff --git a/contrib/python-sdk/docs/scenarios-and-user-stories_zh_CN.md b/contrib/python-sdk/docs/scenarios-and-user-stories_zh_CN.md index 00b7187639..ff935b51d5 100644 --- a/contrib/python-sdk/docs/scenarios-and-user-stories_zh_CN.md +++ b/contrib/python-sdk/docs/scenarios-and-user-stories_zh_CN.md @@ -4,7 +4,7 @@ - **User can easily access `OpenPAI` resources in scripts (`Python` or `Shell`) and `Jupyter` notebooks** -The SDK provides classes to describe the clusters (`openpaisdk.core.Cluster`) and jobs (`openpaisdk.job.Job`). The Cluster class wraps necessary REST apis for convenient operations. The Job class is an implementation of the [protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml), with which user can easily organize (add or edit) the content of job `yaml` and `json` configuration. +The SDK provides classes to describe the clusters (`openpaisdk.core.Cluster`) and jobs (`openpaisdk.job.Job`). The Cluster class wraps necessary REST apis for convenient operations. The Job class is an implementation of the [protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml), with which user can easily organize (add or edit) the content of job `yaml` and `json` configuration. Besides the wrapping of APIs, the SDK also provides functions to facilitate user to utilize `OpenPAI`. Such functions includes *cluster management*, *storage accessing*, *execution environment detection (local or in a job container)*. diff --git a/contrib/python-sdk/openpaisdk/job.py b/contrib/python-sdk/openpaisdk/job.py index e98f38916c..faf0849c58 100644 --- a/contrib/python-sdk/openpaisdk/job.py +++ b/contrib/python-sdk/openpaisdk/job.py @@ -114,7 +114,7 @@ def parse_list(lst: List[str]): class Job: """ - the data structure and methods to describe a job compatible with https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml + the data structure and methods to describe a job compatible with https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml external methods: - I/O - save(...) / load(...): store and restore to the disk @@ -131,7 +131,7 @@ class Job: """ def __init__(self, name: str=None, **kwargs): - self.protocol = dict() # follow the schema of https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml + self.protocol = dict() # follow the schema of https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml self._client = None # cluster client self.new(name, **kwargs) diff --git a/contrib/submit-job-v2/README.md b/contrib/submit-job-v2/README.md index f03fb2bd37..b8d42d8d9b 100644 --- a/contrib/submit-job-v2/README.md +++ b/contrib/submit-job-v2/README.md @@ -8,7 +8,7 @@ This plugin is used to submit PAI job v2 on web portal. User can upload a job v2 yaml file from disk, choose a job v2 from marketplace, or use the submission form to fill in a job v2 config. -Please refer to [PAI protocol spec](../../docs/pai-job-protocol.yaml) for more details. +Please refer to [PAI protocol spec](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml) for more details. ## Build diff --git a/docs/pai-job-protocol.yaml b/docs/pai-job-protocol.yaml deleted file mode 100644 index afe862805b..0000000000 --- a/docs/pai-job-protocol.yaml +++ /dev/null @@ -1,247 +0,0 @@ -%YAML 1.2 ---- -# OpenPAI Job Protocol YAML - -protocolVersion: String, required # Protocol version, current version is 2. -name: String, required # String in ^[a-zA-Z0-9_-]+$ format. -type: String, required # Component type, should be "job" here. -version: String, optional # Component version, default is latest. -contributor: String, optional -description: String, optional - -prerequisites: # Optional - # Each item is the protocol for data, script, dockerimage, or output type. - - protocolVersion: String, optional # If omitted, follow the protocolVersion in root. - name: String, required - type: String, required # Component type. Must be one of the following: data, script, dockerimage, or output. Prerequisites.type cannot be "job". - version: String, optional # Component version, Default is latest. - contributor: String, optional - description: String, optional - auth: Object, optional # Only available when the type is dockerimage. - username: String, optional - password: String, optional # If a password is needed, it should be referenced as a secret - registryuri: String, optional - uri: String or list, required # Only when the type is data can the uri be a list. - -# If specified, the whole parameters object can be referenced as `$parameters`. -# Scope of reference `$parameters`: the reference is shared among all task roles. -parameters: # Optional, can be omitted. - : value1 # Specify name and value of all the referencable parameters that will be used in the whole job template. - : value2 # Can be referenced by `<% $parameters.param1 %>`, `<% $parameters.param2 %>`. - -# If sensitive information including password or API key is needed in the protocol, -# it should be specified here in secrets section and referenced as `$secrets`. -# Scope of reference `$secrets`: the reference is shared among all task roles and docker image's `auth` field. -# A system that supports PAI protocol should keep the secret information away from -# unauthorized users (how to define unauthorized user is out of the scope of this protocol). -# For example, the yaml file used for job cloning, the stdout/stderr should protect all information marked as secrets. -secrets: # Optional, can be omitted. - : password # Specify name and value of all secrets that will be used in the whole job template. - : key # Can be referenced by `<% $secrets.secret1 %>`, `<% $secrets.secret2 %>`. - -jobRetryCount: Integer, optional # Default is 0. -# Task roles are different types of task in the protocol. -# One job may have one or more task roles, each task role has one or more instances, and each instance runs inside one container. -taskRoles: - : String, required # Name of the taskRole, string in ^[a-zA-Z_][a-zA-Z0-9_]*$ format (valid C variable name). - instances: Integer, optional # Default is 1, instances of a taskRole, no less than 1. - completion: # Completion poclicy for the job, https://github.com/Microsoft/pai/blob/master/subprojects/frameworklauncher/yarn/doc/USERMANUAL.md#ApplicationCompletionPolicy. - # Number of failed tasks to fail the entire job, -1 or no less than 1, if set to -1 means the job will always succeed regardless any task failure. - minFailedInstances: Integer, optional # Default is 1. - # Number of succeeded tasks to succeed the entire job, -1 or no less than 1, if set to -1 means the job will only succeed until all tasks are completed and minFailedInstances is not triggered. - minSucceededInstances: Integer, optional # Default is task instances. - taskRetryCount: Integer, optional # Default is 0. - dockerImage: String, required # Should reference to a dockerimage defined in prerequisites. - # Scope of the reference `$data`, '$output', `$script`: the reference is only valid inside this task role. - # User cannot reference them from another task role. Reference for `$parameters` is global and shared among task roles. - data: String, optional # Select data defined in prerequisites, target can be referenced as `$data` in this task role. - output: String, optional # Select output defined in prerequisites, target can be referenced as `$output` in this task role. - script: String, optional # Select script defined in prerequisites, target can be referenced as `$script` in this task role. - extraContainerOptions: - shmMB: Integer, optional # Config the /dev/shm in a docker container, https://docs.docker.com/compose/compose-file/#shm_size. - infiniband: Boolean, optional # Use InfiniBand devices or not in a docker container. - resourcePerInstance: - cpu: Integer, required # CPU number, unit is CPU vcore - memoryMB: Integer, required # Memory number, unit is MB - gpu: Integer, required # GPU number, unit is GPU card - ports: # Optional, only for host network, port label is string in ^[a-zA-Z_][a-zA-Z0-9_]*$ format (valid C variable name). - : Integer, required, minimum number is 1 # Port number for the port label. - commands: - - String, required - -# To handle that a component may interact with different component differently, user is encouraged to place the codes handling such difference in the "deployments" field, -# e.g., a job may get input data through wget, hdfs -dfs cp, copy, or just directly read from remote storage. This logic can be placed here. -# In summary, the deployments field is responsible to make sure the job to run properly in a deployment specific runtime environment. -# One could have many deployments, but only one deployment can be activated at runtime by specifying in "defaults". User can choose the deployment and specify in "defaults" at submission time. -deployments: - - name: String, required - taskRoles: - : String, required # Should be in taskRoles - preCommands: - - String, required # Execute before the taskRole's command - postCommands: - - String, required # Execute after the taskRole's command - - -defaults: # Optional, default cluster specific settings - virtualCluster: String, optional - deployment: String, optional # Should reference to deployment defined in deployments - -extras: # Optional, extra field, object, save any information that plugin may use - submitFrom: String, optional - hivedscheduler: # Optional - jobPriorityClass: String, required - taskRoles: - : String, required - gpuType/reservationId: String, required # Only one allowed - affinityGroupName: String, optional - - ---- -# OpenPAI Job Protocol YAML Example for a Distributed TensorFlow Job - -protocolVersion: 2 -name: tensorflow_cifar10 -type: job -version: 1.0 -contributor: Alice -description: image classification, cifar10 dataset, tensorflow, distributed training - -prerequisites: - - protocolVersion: 2 - name: tf_example - type: dockerimage - version: latest - contributor: Alice - description: python3.5, tensorflow - auth: - username: user - password: <% $secrets.docker_password %> - registryuri: openpai.azurecr.io - uri: openpai/pai.example.tensorflow - - protocolVersion: 2 - name: tensorflow_cifar10_model - type: output - version: latest - contributor: Alice - description: cifar10 data output - uri: hdfs://10.151.40.179:9000/core/cifar10_model - - protocolVersion: 2 - name: tensorflow_cnnbenchmarks - type: script - version: 84820935288cab696c9c2ac409cbd46a1f24723d - contributor: MaggieQi - description: tensorflow benchmarks - uri: github.com/MaggieQi/benchmarks - - protocolVersion: 2 - name: cifar10 - type: data - version: latest - contributor: Alice - description: cifar10 dataset, image classification - uri: - - https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz - -parameters: - model: resnet20 - batchsize: 32 - -secrets: - docker_password: password - github_token: cGFzc3dvcmQ= - -jobRetryCount: 1 -taskRoles: - worker: - instances: 1 - completion: - minFailedInstances: 1 - minSucceededInstances: 1 - taskRetryCount: 0 - dockerImage: tf_example - data: cifar10 - output: tensorflow_cifar10_model - script: tensorflow_cnnbenchmarks - extraContainerOptions: - shmMB: 64 - resourcePerInstance: - cpu: 2 - memoryMB: 16384 - gpu: 4 - ports: - ssh: 1 - http: 1 - commands: - - cd script_<% $script.name %>/scripts/tf_cnn_benchmarks - - > - python tf_cnn_benchmarks.py --job_name=worker - --local_parameter_device=gpu - --variable_update=parameter_server - --ps_hosts=$PAI_TASK_ROLE_ps_server_HOST_LIST - --worker_hosts=$PAI_TASK_ROLE_worker_HOST_LIST - --task_index=$PAI_CURRENT_TASK_ROLE_CURRENT_TASK_INDEX - --data_name=<% $data.name %> - --data_dir=$PAI_WORK_DIR/data_<% $data.name %> - --train_dir=$PAI_WORK_DIR/output_<% $output.name %> - --model=<% $parameters.model %> - --batch_size=<% $parameters.batchsize %> - ps_server: - instances: 1 - completion: - minFailedInstances: 1 - minSucceededInstances: -1 - taskRetryCount: 0 - dockerImage: tf_example - data: cifar10 - output: tensorflow_cifar10_model - script: tensorflow_cnnbenchmarks - extraContainerOptions: - shmMB: 64 - resourcePerInstance: - cpu: 2 - memoryMB: 8192 - gpu: 0 - ports: - ssh: 1 - http: 1 - commands: - - cd script_<% $script.name %>/scripts/tf_cnn_benchmarks - - > - python tf_cnn_benchmarks.py --job_name=ps - --local_parameter_device=gpu - --variable_update=parameter_server - --ps_hosts=$PAI_TASK_ROLE_ps_server_HOST_LIST - --worker_hosts=$PAI_TASK_ROLE_worker_HOST_LIST - --task_index=$PAI_CURRENT_TASK_ROLE_CURRENT_TASK_INDEX - --data_dir=$PAI_WORK_DIR/data_<% $data.name %> - --data_name=<% $data.name %> - --train_dir=$PAI_WORK_DIR/output_<% $output.name %> - --model=<% $parameters.model %> - --batch_size=<% $parameters.batchsize %> - -deployments: - - name: prod # This implementation will download the data to local disk, and the computed model will be output to local disk first and then being copied to hdfs. - version: 1.0 - taskRoles: - worker: - preCommands: - - wget <% $data.uri[0] %> -P data_<% $data.name %> # If local data cache deployed, one can copy data from local cache, only wget in case of cache miss. - - > - git clone https://<% $script.contributor %>:<% $secrets.github_token %>@<% $script.uri %> script_<% $script.name %> && - cd script_<% $script.name %> && git checkout <% $script.version %> && cd .. - # Then the system will go ahead to execute worker's command. - ps_server: - preCommands: - - wget <% $data.uri[0] %> -P data_<% $data.name %> - - > - git clone https://<% $script.contributor %>:<% $secrets.github_token %>@<% $script.uri %> script_<% $script.name %> && - cd script_<% $script.name %> && git checkout <% $script.version %> && cd .. - # Then the system will go ahead to execute ps_server's command. - postCommands: - # After the execution of ps_server's command, the system goes here. - - hdfs dfs -cp output_<% $output.name %> <% $output.uri %> - # Assume the model is output locally, and this command copies the local output to hdfs. One can output to hdfs directly. - # In this case, you will have to change "--train_dir=$PAI_WORK_DIR/output_<% $output.name %>". - -defaults: - deployment: prod # Use prod deployment in job submission. diff --git a/docs/system_architecture.md b/docs/system_architecture.md index f084eb015c..ef1b0f5d11 100644 --- a/docs/system_architecture.md +++ b/docs/system_architecture.md @@ -11,7 +11,7 @@ OpenPAI provides [paictl](./paictl/paictl-manual.md), a tool to help user deploy One key design goal of OpenPAI is to facilitate the sharing and reproducing of AI innovations. To this end, OpenPAI introduces [marketplace](../contrib/marketplace/README.md), where people can share their workloads and data within a private group or publically. -The workloads and data in the marketplace are described by [OpenPAI protocol](./pai-job-protocol.yaml), a specification that describes the hardware and software requirement of a workload or dataset. +The workloads and data in the marketplace are described by [OpenPAI protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml), a specification that describes the hardware and software requirement of a workload or dataset. The hardware and software requirements include GPU/CPU/Memory resource requirement, docker images, data/code location, the training method (gang scheduling or elastic), job completion policy, etc. OpenPAI protocol facilitates platform interoperability and job portability, a job described by the protocol can run on different clusters managed by OpenPAI, as long as the clusters can meet the specification. The OpenPAI protocol also enables great flexibility, any AI workload, being it Tensorflow, PyTorch, or your proprietary deep learning workload, can be described by the protocol. diff --git a/docs/user/job_submission.md b/docs/user/job_submission.md index 37febc0a2b..47cee173ba 100644 --- a/docs/user/job_submission.md +++ b/docs/user/job_submission.md @@ -34,7 +34,7 @@ This document is a tutorial for job submission on OpenPAI (If you are using OpenPAI <= 0.13.0, please refer to [this document](./training.md)). Before learning this document, make sure you have an OpenPAI cluster already. If there isn't yet, refer to [here](../../README.md#deploy-openpai) to deploy one. -There are several ways of submitting pai job, including webportal, [OpenPAI VS Code Client](https://github.com/microsoft/pai/tree/master/contrib/pai_vscode), and [python sdk](https://github.com/microsoft/pai/tree/master/contrib/python-sdk). And all the job configs follow [Pai Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml). Here we use webportal to submit a hello world job. +There are several ways of submitting pai job, including webportal, [OpenPAI VS Code Client](https://github.com/microsoft/pai/tree/master/contrib/pai_vscode), and [python sdk](https://github.com/microsoft/pai/tree/master/contrib/python-sdk). And all the job configs follow [OpenPAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). Here we use webportal to submit a hello world job. ## Submit a Hello World Job @@ -130,7 +130,7 @@ In some cases, it is desired to define some secret messages such as password, to ## Advanced Mode -You can set more detailed configs by enabling advanced mode. In the advanced mode, you could define ```retry time```, ```ports```, ```completion policy``` before submitting job. For more details about the fields, please refer to [Pai Job Protocol](../pai-job-protocol.yaml). +You can set more detailed configs by enabling advanced mode. In the advanced mode, you could define ```retry time```, ```ports```, ```completion policy``` before submitting job. For more details about the fields, please refer to [OpenPAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). ## PAI Environment Variables diff --git a/docs/zh_CN/rest-server/API.md b/docs/zh_CN/rest-server/API.md index 3d29ee9701..c7b1b8d871 100644 --- a/docs/zh_CN/rest-server/API.md +++ b/docs/zh_CN/rest-server/API.md @@ -45,7 +45,7 @@ HTTP://restserver/api/v1/authn/oidc/return HTTP GET the redirect URL of Azure AD to sign out the authentication: ```url -http://restserver/api/v1/authn/oidc/login +http://restserver/api/v1/authn/oidc/login ``` ## 3. Submit a job @@ -70,27 +70,27 @@ curl -H "Content-Type: application/json" \ Check the list of jobs at: http://restserver/api/v1/jobs - + or http://restserver/api/v1/user/:username/jobs - + Check your exampleJob status at: http://restserver/api/v1/user/:username/jobs/exampleJob - + Get the job config JSON content: http://restserver/api/v1/user/:username/jobs/exampleJob/config - + Get the job's SSH info: http://restserver/api/v1/user/:username/jobs/exampleJob/ssh - + # RestAPI @@ -196,7 +196,7 @@ Admin can create a user in system. POST /api/v2/user Authorization: Bearer - + *Parameters* @@ -207,7 +207,7 @@ Admin can create a user in system. "admin": true | false, "email": "email address or empty string", "virtualCluster": ["vcname1 in [A-Za-z0-9_]+ format", "vcname2 in [A-Za-z0-9_]+ format"], - "extension": { + "extension": { "extension-key1": "extension-value1" } } @@ -874,7 +874,7 @@ Admin can create a group in system. POST /api/v2/group Authorization: Bearer - + *Parameters* @@ -883,7 +883,7 @@ Admin can create a group in system. "groupname": "username in [A-Za-z0-9_]++ format", "description": "description for the group", "externalName": "the external group name binding with the group in OpenPAI", - "extension": { + "extension": { "extension-key1": "extension-value1" } } @@ -940,13 +940,13 @@ Admin can change a group's extension. PUT /api/v2/group/:groupname/extension Authorization: Bearer - + *Parameters* ```json { - "extension": { + "extension": { "key-create-or-update-1": "extension-value1", "key-create-or-update-2": [ ... ], "key-create-or-update-3": { ... } @@ -1005,7 +1005,7 @@ Admin can change a specific attribute in a nested group extension. Admin could c PUT /api/v2/group/:groupname/extension/path/to/attr Authorization: Bearer - + *Body* @@ -1021,12 +1021,12 @@ Admin can change a specific attribute in a nested group extension. Admin could c PUT /api/v2/group/:groupname/extension/acls/virtualClusters Authorization: Bearer Body {"data": ["vc1", "vc2"]} - + Update group admin privilege PUT /api/v2/group/:groupname/extension/acls/admin Authorization: Bearer Body {"data": true/false} - + *Response if succeeded* @@ -1079,7 +1079,7 @@ Admin can change a group's description. PUT /api/v2/group/:groupname/description Authorization: Bearer - + *Parameters* @@ -1140,7 +1140,7 @@ Admin can change a group's externalname, and bind it with another external group PUT /api/v2/group/:groupname/externalname Authorization: Bearer - + *Parameters* @@ -1201,7 +1201,7 @@ Admin can delete a group from system. DELETE /api/v2/group/:groupname Authorization: Bearer - + *Parameters* @@ -2020,7 +2020,7 @@ Authorization: Bearer *Parameters* -[job protocol yaml](../pai-job-protocol.yaml) +[job protocol yaml](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml) *Response if succeeded* @@ -2046,12 +2046,12 @@ Status: 400 *Response if user has no permission* Status: 403 - + { "code": "ForbiddenUserError", "message": "User $username is not allowed to add job to $vcname } - + *Response if there is a duplicated job submission* diff --git a/docs/zh_CN/user/job_submission.md b/docs/zh_CN/user/job_submission.md index 0cfe2f264b..89435a46a7 100644 --- a/docs/zh_CN/user/job_submission.md +++ b/docs/zh_CN/user/job_submission.md @@ -34,7 +34,7 @@ This document is a tutorial for job submission on OpenPAI (If you are using OpenPAI <= 0.13.0, please refer to [this document](./training.md)). Before learning this document, make sure you have an OpenPAI cluster already. 如果还没安装 OpenPAI 集群,参考[这里](../../../README_zh_CN.md#部署)进行部署。 -There are several ways of submitting pai job, including webportal, [OpenPAI VS Code Client](https://github.com/microsoft/pai/tree/master/contrib/pai_vscode), and [python sdk](https://github.com/microsoft/pai/tree/master/contrib/python-sdk). And all the job configs follow [Pai Job Protocol](https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml). Here we use webportal to submit a hello world job. +There are several ways of submitting pai job, including webportal, [OpenPAI VS Code Client](https://github.com/microsoft/pai/tree/master/contrib/pai_vscode), and [python sdk](https://github.com/microsoft/pai/tree/master/contrib/python-sdk). And all the job configs follow [OpenPAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). Here we use webportal to submit a hello world job. ## 提交 Hello World Job @@ -132,7 +132,7 @@ In some cases, it is desired to define some secret messages such as password, to ## Advanced Mode -You can set more detailed configs by enabling advanced mode. In the advanced mode, you could define ```retry time```, ```ports```, ```completion policy``` before submitting job. For more details about the fields, please refer to [Pai Job Protocol](../pai-job-protocol.yaml). +You can set more detailed configs by enabling advanced mode. In the advanced mode, you could define ```retry time```, ```ports```, ```completion policy``` before submitting job. For more details about the fields, please refer to [OpenPAI Job Protocol](https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml). ## PAI Environment Variables diff --git a/src/webportal/src/app/job-submission/components/submission-section.jsx b/src/webportal/src/app/job-submission/components/submission-section.jsx index 8bd4a8e01f..e9af629a82 100644 --- a/src/webportal/src/app/job-submission/components/submission-section.jsx +++ b/src/webportal/src/app/job-submission/components/submission-section.jsx @@ -51,7 +51,7 @@ import Context from './context'; import { FormShortSection } from './form-page'; const JOB_PROTOCOL_SCHEMA_URL = - 'https://github.com/microsoft/pai/blob/master/docs/pai-job-protocol.yaml'; + 'https://github.com/microsoft/openpai-protocol/blob/master/schemas/v2/schema.yaml'; const user = cookies.get('user'); const { palette } = getTheme();