From 364b6c3eb46e2a72a1fc3be489e1b964ca9aa915 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Sat, 4 Apr 2020 20:52:37 -0400 Subject: [PATCH] Edits on tutorial for XGBoost job on Kubernetes --- doc/tutorials/kubernetes.rst | 40 +++++++++++++++++------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/doc/tutorials/kubernetes.rst b/doc/tutorials/kubernetes.rst index 37673abb25e6..d606858007fd 100644 --- a/doc/tutorials/kubernetes.rst +++ b/doc/tutorials/kubernetes.rst @@ -1,36 +1,34 @@ ################################### -Distributed XGBoost with Kubernetes +Distributed XGBoost on Kubernetes ################################### -Kubeflow community provides `XGBoost Operator `_ to support distributed XGBoost training and batch prediction in a Kubernetes cluster. It provides an easy and efficient XGBoost model training and batch prediction in distributed fashion. +Distributed XGBoost training and batch prediction on `Kubernetes `_ are supported via `Kubeflow XGBoost Operator `_. -********** -How to use -********** -In order to run a XGBoost job in a Kubernetes cluster, carry out the following steps: +************ +Instructions +************ +In order to run a XGBoost job in a Kubernetes cluster, perform the following steps: -1. Install XGBoost Operator in Kubernetes. +1. Install XGBoost Operator on the Kubernetes cluster. - a. XGBoost Operator is designed to manage XGBoost jobs, including job scheduling, monitoring, pods and services recovery etc. Follow the `installation guide `_ to install XGBoost Operator. + a. XGBoost Operator is designed to manage the scheduling and monitoring of XGBoost jobs. Follow `this installation guide `_ to install XGBoost Operator. -2. Write application code to interface with the XGBoost operator. +2. Write application code that will be executed by the XGBoost Operator. - a. You'll need to furnish a few scripts to inteface with the XGBoost operator. Refer to the `Iris classification example `_. - b. Data reader/writer: you need to have your data source reader and writer based on the requirement. For example, if your data is stored in a Hive Table, you have to write your own code to read/write Hive table based on the ID of worker. - c. Model persistence: in this example, model is stored in the OSS storage. If you want to store your model into Amazon S3, Google NFS or other storage, you'll need to specify the model reader and writer based on the requirement of storage system. + a. To use XGBoost Operator, you'll have to write a couple of Python scripts that implement the distributed training logic for XGBoost. Please refer to the `Iris classification example `_. + b. Data reader/writer: you need to implement the data reader and writer based on the specific requirements of your chosen data source. For example, if your dataset is stored in a Hive table, you have to write the code to read from or write to the Hive table based on the index of the worker. + c. Model persistence: in the `Iris classification example `_, the model is stored in `Alibaba OSS `_. If you want to store your model in other storages such as Amazon S3 or Google NFS, you'll need to implement the model persistence logic based on the requirements of the chosen storage system. 3. Configure the XGBoost job using a YAML file. - a. YAML file is used to configure the computation resource and environment for your XGBoost job to run, e.g. the number of workers and masters. The template `YAML template `_ is provided for reference. + a. YAML file is used to configure the computational resources and environment for your XGBoost job to run, e.g. the number of workers/masters and the number of CPU/GPUs. Please refer to this `YAML template `_ for an example. -4. Submit XGBoost job to Kubernetes cluster. +4. Submit XGBoost job to a Kubernetes cluster. - a. `Kubectl command `_ is used to submit a XGBoost job, and then you can monitor the job status. + a. Use `kubectl `_ to submit a distributed XGBoost job as illustrated `here `_. -**************** -Work in progress -**************** +******* +Support +******* -- XGBoost Model serving -- Distributed data reader/writer from/to HDFS, HBase, Hive etc. -- Model persistence on Amazon S3, Google NFS etc. +Please submit an issue on `XGBoost Operator repo `_ for any feature requests or problems.