Skip to content

Latest commit

 

History

History
166 lines (129 loc) · 6.89 KB

how-to-generate-cluster-config.md

File metadata and controls

166 lines (129 loc) · 6.89 KB

Generate configuration from template

Index

Step 1. Write Quick start

There is a example file in the link .

An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.

# quick-start.yaml

# (Required) Please fill in the IP address of the server you would like to deploy OpenPAI
machines:
  - 192.168.1.11
  - 192.168.1.12
  - 192.168.1.13

# (Required) Log-in info of all machines. System administrator should guarantee
# that the username/password pair or username/key-filename is valid and has sudo privilege.
ssh-username: pai
ssh-password: pai-password

# (Optional, default=None) the key file that ssh client uses, that has higher priority then password.
#ssh-keyfile-path: <keyfile-path>

# (Optional, default=22) Port number of ssh service on each machine.
#ssh-port: 22

# (Optional, default=DNS of the first machine) Cluster DNS.
#dns: <ip-of-dns>

# (Optional, default=10.254.0.0/16) IP range used by Kubernetes. Note that
# this IP range should NOT conflict with the current network.
#service-cluster-ip-range: <ip-range-for-k8s>

Step 2. Generate OpenPAI configuration files

(1) generate configuration files
cd /pai

# cmd should be executed under pai directory in the dev-box.

python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f
(2) update docker tag to release version
vi ~/pai-config/services-configuration.yaml

For example: v0.x.y branch, user should change docker-registry.tag to v0.x.y.

docker:
  ...
  tag: v0.x.y
(3) changing gpu count and type

Quick start will generate node with 1 gpu with type generic, this may not suit your situation, for example, if you have two types of machines, and one type has 4 Tesla K80 gpu cards, and another has 2 Tesla P100 cards, you should modify your ~/pai-config/layout.yaml as following:

machine-sku:
  k80-node:
    mem: 40G
    gpu:
      type: Tesla K80
      count: 4
    cpu:
      vcore: 24
    os: ubuntu16.04
  p100-node:
    mem: 20G
    gpu:
      type: Tesla P100
      count: 2
    cpu:
      vcore: 24
    os: ubuntu16.04

machine-list:
  - hostname: xxx
    hostip: yyy
    machine-type: k80-node
  - hostname: xxx
    hostip: yyy
    machine-type: p100-node
(4) The default value in the generated configuration

The paictl tool sets the following default values in the 4 configuration files:

Configuration Property Default value
master node The first machine in the machine list will be configured as the master node.
SSH port If not explicitly specified, the SSH port is set to 22.
cluster DNS If not explicitly specified, the cluster DNS is set to the value of the nameserver field in /etc/resolv.conf file of the master node.
IP range used by Kubernetes If not explicitly specified, the IP range used by Kubernetes is set to 10.254.0.0/16.
docker registry The docker registry is set to docker.io, and the docker namespace is set to openpai. In another word, all PAI service images will be pulled from docker.io/openpai (see this link on DockerHub for the details of all images).
Cluster id Cluster id is set to pai-example
REST server's admin user REST server's admin user is set to admin, and its password is set to admin-password
VC There is only one VC in the system, default, which has 100% of the resource capacity.

Optional Step 3. Customize configure OpenPAI

This method is for advanced users.

The description of each field in these configuration files can be found in A Guide For Cluster Configuration.

If user want to customize configuration, please see the table below