Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Update document v0.4 #437

Merged
merged 12 commits into from
Dec 5, 2018
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 12 additions & 16 deletions docs/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,10 +376,6 @@ machineList:
__server__ is the host of nfs server

__path__ is the mounted path of nfs

* __kubernetsServer__

__kubernetsServer__ set the host of kubernets service.

* __keyVault__

Expand All @@ -393,6 +389,18 @@ machineList:

__name__ is the value of ```--name``` used in az command.

* __azureStorage__

If users use azure kubernets service, they should set azure storage account to store code files.
SparkSnail marked this conversation as resolved.
Show resolved Hide resolved

* __accountName__

__accountName__ is the name of azure storage account.

* __azureShare__

__azureShare__ is the share of the azure file storage.

* __paiConfig__

* __userName__
Expand All @@ -406,18 +414,6 @@ machineList:
* __host__

__host__ is the host of pai.

* __azureStorage__

If users use azure kubernets service, they should set azure storage account to store code files.

* __accountName__

__accountName__ is the name of azure storage account.

* __azureShare__

__azureShare__ is the share of the azure file storage.



Expand Down
10 changes: 4 additions & 6 deletions docs/KubeflowMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
===
Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/kubeflow), called kubeflow mode. Before starting to use NNI kubeflow mode, you should have a kubernetes cluster, either on-prem or [Azure Kubernetes Service(AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/), a Ubuntu machine on which [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) is installed and configured to connect to your kubernetes cluster. If you are not familiar with kubernetes, [here](https://kubernetes.io/docs/tutorials/kubernetes-basics/) is a goot start. In kubeflow mode, your trial program will run as kubeflow job in kubernetes cluster.

## Prerequisite
## Prerequisite for on-premises Kubernetes Service
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow
3. Install **kubectl**, and configure to connect to your Kubernetes API server.
Expand All @@ -12,11 +12,10 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku

## Prerequisite for Azure Kubernets Service
1. NNI support kubeflow based on Azure Kubernets Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernets Service.
2. Deploy kubeflow on Azure Kubernets Service.
3. Install __kubectl__ and [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest). Connect kubectl client to Azure K8S, and use `az login` to set azure account.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, [refer](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Deploy kubeflow on Azure Kubernets Service, follow the [guideline](https://www.kubeflow.org/docs/started/getting-started/).
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernets Service, nni need Azure Storage Service to store code files and the output files.
5. Set up Azure Key Vault Service, add a secret to Key Vault
to store the private key of Azure account.
5. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.

## Design
TODO
Expand Down Expand Up @@ -62,7 +61,6 @@ kubeflowConfig:
nfs:
server: {your_nfs_server}
path: {your_nfs_server_exported_path}
kubernetesServer: {your_kubernetes_api_server_ip}
```
If you use Azure Kubernets Service, you should set `kubeflowConfig` in your config yaml file as follows:
```
Expand Down
3 changes: 1 addition & 2 deletions tools/nni_cmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,7 @@
'nfs': {
'server': str,
'path': str
},
Optional('kubernetesServer'): str
}
},{
'operator': Or('tf-operator', 'mxnet-operator', 'pytorch-operator'),
'keyVault': {
Expand Down
2 changes: 1 addition & 1 deletion tools/nni_cmd/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

import os

NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nni', 'nnictl')
NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nnictl')

ERROR_INFO = 'ERROR: %s'

Expand Down
7 changes: 5 additions & 2 deletions tools/nni_cmd/launcher_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import os
import json
from .config_schema import LOCAL_CONFIG_SCHEMA, REMOTE_CONFIG_SCHEMA, PAI_CONFIG_SCHEMA, KUBEFLOW_CONFIG_SCHEMA
from .common_utils import get_json_content, print_error
from .common_utils import get_json_content, print_error, print_warning

def expand_path(experiment_config, key):
'''Change '~' to user home directory'''
Expand All @@ -31,7 +31,10 @@ def expand_path(experiment_config, key):
def parse_relative_path(root_path, experiment_config, key):
'''Change relative path to absolute path'''
if experiment_config.get(key) and not os.path.isabs(experiment_config.get(key)):
experiment_config[key] = os.path.join(root_path, experiment_config.get(key))
absolute_path = os.path.join(root_path, experiment_config.get(key))
print_warning('expand %s: %s to %s ' % (key, experiment_config[key], absolute_path))
experiment_config[key] = absolute_path


def parse_time(experiment_config):
'''Parse time format'''
Expand Down
21 changes: 16 additions & 5 deletions tools/nni_cmd/nnictl_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,24 @@
import time
from .common_utils import print_normal, print_error, print_warning, detect_process

def update_experiment_status():
'''Update the experiment status in config file'''
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
return None
for key in experiment_dict.keys():
if isinstance(experiment_dict[key], dict):
if experiment_dict[key].get('status') == 'running':
nni_config = Config(experiment_dict[key]['fileName'])
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
experiment_config.update_experiment(key, 'status', 'stopped')

def check_experiment_id(args):
'''check if the id is valid
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
Expand Down Expand Up @@ -76,6 +91,7 @@ def parse_ids(args):
5.If the id does not exist but match the prefix of an experiment id, nnictl will return the matched id
6.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information
'''
update_experiment_status()
experiment_config = Experiments()
experiment_dict = experiment_config.get_all_experiments()
if not experiment_dict:
Expand Down Expand Up @@ -175,11 +191,6 @@ def stop_experiment(args):
nni_config = Config(experiment_dict[experiment_id]['fileName'])
rest_port = nni_config.get_config('restServerPort')
rest_pid = nni_config.get_config('restServerPid')
if not detect_process(rest_pid):
print_normal('Experiment is not running...')
experiment_config.update_experiment(experiment_id, 'status', 'stopped')
return
rest_pid = nni_config.get_config('restServerPid')
if rest_pid:
stop_rest_cmds = ['kill', str(rest_pid)]
call(stop_rest_cmds)
Expand Down