Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Update document v0.4 (#437)
Browse files Browse the repository at this point in the history
move nnictl folder
delete kubernetsServer in nnictl
refactor aks document
add warning information to expand relative path
update experiment status when the experiment crashed.
  • Loading branch information
SparkSnail authored Dec 5, 2018
1 parent 55fa695 commit 5c580f2
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 35 deletions.
32 changes: 14 additions & 18 deletions docs/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ machineList:

* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)

* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernets and [azure kubernets](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).

* __searchSpacePath__
* Description
Expand Down Expand Up @@ -376,14 +376,10 @@ machineList:
__server__ is the host of nfs server

__path__ is the mounted path of nfs

* __kubernetsServer__

__kubernetsServer__ set the host of kubernets service.

* __keyVault__

If users want to use azure kubernets service, they should set keyVault to storage the private key of your azure storage account. Refer: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2
If users want to use azure kubernetes service, they should set keyVault to storage the private key of your azure storage account. Refer: https://docs.microsoft.com/en-us/azure/key-vault/key-vault-manage-with-cli2

* __vaultName__

Expand All @@ -393,6 +389,18 @@ machineList:

__name__ is the value of ```--name``` used in az command.

* __azureStorage__

If users use azure kubernetes service, they should set azure storage account to store code files.

* __accountName__

__accountName__ is the name of azure storage account.

* __azureShare__

__azureShare__ is the share of the azure file storage.

* __paiConfig__

* __userName__
Expand All @@ -406,18 +414,6 @@ machineList:
* __host__

__host__ is the host of pai.

* __azureStorage__

If users use azure kubernets service, they should set azure storage account to store code files.

* __accountName__

__accountName__ is the name of azure storage account.

* __azureShare__

__azureShare__ is the share of the azure file storage.



Expand Down
17 changes: 8 additions & 9 deletions docs/KubeflowMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
===
Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/kubeflow), called kubeflow mode. Before starting to use NNI kubeflow mode, you should have a kubernetes cluster, either on-prem or [Azure Kubernetes Service(AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/), a Ubuntu machine on which [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) is installed and configured to connect to your kubernetes cluster. If you are not familiar with kubernetes, [here](https://kubernetes.io/docs/tutorials/kubernetes-basics/) is a goot start. In kubeflow mode, your trial program will run as kubeflow job in kubernetes cluster.

## Prerequisite
## Prerequisite for on-premises Kubernetes Service
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow
3. Install **kubectl**, and configure to connect to your Kubernetes API server.
Expand All @@ -15,13 +15,12 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
7. Install **NNI**, follow the install guide [here](GetStarted.md).
## Prerequisite for Azure Kubernets Service
1. NNI support kubeflow based on Azure Kubernets Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernets Service.
2. Deploy kubeflow on Azure Kubernets Service.
3. Install __kubectl__ and [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest). Connect kubectl client to Azure K8S, and use `az login` to set azure account.
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernets Service, nni need Azure Storage Service to store code files and the output files.
5. Set up Azure Key Vault Service, add a secret to Key Vault
to store the private key of Azure account.
## Prerequisite for Azure Kubernetes Service
1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, [refer](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Deploy kubeflow on Azure Kubernetes Service, follow the [guideline](https://www.kubeflow.org/docs/started/getting-started/).
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, nni need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Design
TODO
Expand Down Expand Up @@ -68,7 +67,7 @@ kubeflowConfig:
server: {your_nfs_server}
path: {your_nfs_server_exported_path}
```
If you use Azure Kubernets Service, you should set `kubeflowConfig` in your config yaml file as follows:
If you use Azure Kubernetes Service, you should set `kubeflowConfig` in your config yaml file as follows:
```
kubeflowConfig:
operator: tf-operator
Expand Down
2 changes: 1 addition & 1 deletion tools/nni_cmd/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

import os

NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nni', 'nnictl')
NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nnictl')

ERROR_INFO = 'ERROR: %s'

Expand Down
7 changes: 5 additions & 2 deletions tools/nni_cmd/launcher_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import os
import json
from .config_schema import LOCAL_CONFIG_SCHEMA, REMOTE_CONFIG_SCHEMA, PAI_CONFIG_SCHEMA, KUBEFLOW_CONFIG_SCHEMA
from .common_utils import get_json_content, print_error
from .common_utils import get_json_content, print_error, print_warning

def expand_path(experiment_config, key):
'''Change '~' to user home directory'''
Expand All @@ -31,7 +31,10 @@ def expand_path(experiment_config, key):
def parse_relative_path(root_path, experiment_config, key):
'''Change relative path to absolute path'''
if experiment_config.get(key) and not os.path.isabs(experiment_config.get(key)):
experiment_config[key] = os.path.join(root_path, experiment_config.get(key))
absolute_path = os.path.join(root_path, experiment_config.get(key))
print_warning('expand %s: %s to %s ' % (key, experiment_config[key], absolute_path))
experiment_config[key] = absolute_path


def parse_time(experiment_config):
'''Parse time format'''
Expand Down
21 changes: 16 additions & 5 deletions tools/nni_cmd/nnictl_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,24 @@
import time
from .common_utils import print_normal, print_error, print_warning, detect_process

def update_experiment_status():
'''Update the experiment status in config file'''
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
return None
for key in experiment_dict.keys():
if isinstance(experiment_dict[key], dict):
if experiment_dict[key].get('status') == 'running':
nni_config = Config(experiment_dict[key]['fileName'])
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
experiment_config.update_experiment(key, 'status', 'stopped')

def check_experiment_id(args):
'''check if the id is valid
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
Expand Down Expand Up @@ -76,6 +91,7 @@ def parse_ids(args):
5.If the id does not exist but match the prefix of an experiment id, nnictl will return the matched id
6.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
Expand Down Expand Up @@ -175,11 +191,6 @@ def stop_experiment(args):
nni_config = Config(experiment_dict[experiment_id]['fileName'])
rest_port = nni_config.get_config('restServerPort')
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
print_normal('Experiment is not running...')
experiment_config.update_experiment(experiment_id, 'status', 'stopped')
return
rest_pid = nni_config.get_config('restServerPid')
if rest_pid:
stop_rest_cmds = ['kill', str(rest_pid)]
call(stop_rest_cmds)
Expand Down

0 comments on commit 5c580f2

Please sign in to comment.