BatchAI Workshop

Batch AI provides managed infrastructure to help data scientists with cluster management and scheduling, scaling, and monitoring of AI jobs. Batch AI works on top of virtual machine scale sets and docker.

Batch AI can run training jobs in docker containers or directly on the compute nodes.

Batch AI

Cluster
Jobs
Azure File Share - stdout, stderr, may contain python scripts
Azure Blob Storage - python scripts, data

YOLO

You Only Look Once (YOLO) is a real-time object detection system. We will be running YOLOv3 on a single image with BatchAI. If you would like to run YOLO without a cluster you can follow the steps on the YOLO site.

Make the project

git clone https://github.com/pjreddie/darknet
cd darknet
make

Download the weights

wget https://pjreddie.com/media/files/yolov3.weights

Run YOLO

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

YOLOv3 should output something like:

  ...
  104 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
  105 conv    255  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 255  0.353 BFLOPs
  106 detection
Loading weights from cfg/yolov3.weights...Done!
data/dog.jpg: Predicted in 24.016015 seconds.
dog: 99%
truck: 92%
bicycle: 99%

Parallelizing Batch AI jobs

Python train and test scripts define the parallel strategy used, not Batch AI.

For example,

CNTK uses a synchronous data parallel training strategy
Tensorflow uses a asynchronous model parallel training strategy

Note

Make sure .sh scripts have LF endings - use dos2unix to fix
To enable faster communication between the nodes it´s necessary to use Intel MPI and have InfiniBand on the VM
NC24r (works with Intel MPI and InfiniBand) quota is 1 core by default in any subscription, so make quota increase requests early
There's no reset ssh-key for nodes
Do not put CMD in the dockerfile used by Batch AI. Since the container runs in detached mode, it will exit on CMD
Error messages within the container are not very descriptive
Clusters take a long time to provision and deallocate

Resources

Configure Azure CLI to use Batch AI

Azure CLI Configuration

Set default subscription

az account set -s <subscription id>
az account list -o table

Create resource group

az group create -n <rg name> -l eastus

Create a storage account

az storage account create \
  -n <storage account name> \
  --sku Standard_LRS \
  -l eastus \
  -g <rg name>

Get storage account key

az storage account keys list \
  -n <storage account name> \
  -g <rg name> \
  --query "[0].value"

Create a file share

az storage share create \
  -n <share name> \
  --account-name <storage account name> \
  --account-key <storage account key>

Create a directory in your file share to hold python scripts

az storage directory create \
  -s <share name> \
  -n yolo \
  --account-name <storage account name> \
  --account-key <storage account key>

Upload python scripts to file share

az storage file upload \
  -s <share name> \
  --source <python script> \
  -p yolo \
  --account-name <storage account name> \
  --account-key <storage account key>

Create cluster

Create a cluster.json

Config parameters defined by ClusterCreateParameters in the batch ai swagger docs.

List of VM images

Create cluster with cluster.json config

az batchai cluster create \
  -n <cluster name> \
  -l eastus \
  -g <rg name> \
  -c cluster.json

Create cluster without cluster.json config

az batchai cluster create \
  -n <cluster name> \
  -g <rg name> \
  -l eastus \
  --storage-account-name <storage account name> \
  --storage-account-key <storage account key> \
  -i UbuntuDSVM \
  -s Standard_NC6 \
  --min 2 \
  --max 2 \
  --afs-name <share name> \
  --afs-mount-path external \
  -u $USER \
  -k ~/.ssh/id_rsa.pub \
  -p <password>

View Cluster Status

az batchai cluster show \
  -n <cluster name> \
  -g <rg name> \
  -o table

Create a job

Create job.json

View JobBaseProperties in the batch ai swagger docs for the possible parameters to use in job.json.

az batchai job create \
  -g <rg name> \
  -l eastus \
  -n <job name> \
  -r <cluster name> \
  -c job.json

Monitor the job

az batchai job show \
  -n <job name> \
  -g <rg name> \
  -o table

Stream job file output

az batchai job stream-file \
  -j <job name> \
  -n stdout.txt \
  -d stdouterr \
  -g <rg name>

List ip and port of nodes in cluster

az batchai cluster list-nodes \
  -n <cluster name> \
  -g <rg name>

SSH into the VM

ssh <ip> -p <port>

$AZ_BATCHAI_MOUNT_ROOT is an environment variable set by Batch AI for each job, it's value depends on the image used for nodes creation. For example, on Ubuntu based images it's equal to /mnt/batch/tasks/shared/LS_root/mounts. You can cd to this directory and view the python scripts and logs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dockerfile		Dockerfile
README.md		README.md
cluster.json		cluster.json
job.json		job.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BatchAI Workshop

Batch AI

YOLO

Make the project

Download the weights

Run YOLO

Parallelizing Batch AI jobs

Note

Resources

Configure Azure CLI to use Batch AI

Set default subscription

Create resource group

Create a storage account

Get storage account key

Create a file share

Create a directory in your file share to hold python scripts

Upload python scripts to file share

Create cluster

Create a cluster.json

Create cluster with cluster.json config

Create cluster without cluster.json config

View Cluster Status

Create a job

Create job.json

Monitor the job

Stream job file output

List ip and port of nodes in cluster

SSH into the VM

About

Releases

Packages

Smarker/batchai-workshop

Folders and files

Latest commit

History

Repository files navigation

BatchAI Workshop

Batch AI

YOLO

Make the project

Download the weights

Run YOLO

Parallelizing Batch AI jobs

Note

Resources

Configure Azure CLI to use Batch AI

Set default subscription

Create resource group

Create a storage account

Get storage account key

Create a file share

Create a directory in your file share to hold python scripts

Upload python scripts to file share

Create cluster

Create a cluster.json

Create cluster with cluster.json config

Create cluster without cluster.json config

View Cluster Status

Create a job

Create job.json

Monitor the job

Stream job file output

List ip and port of nodes in cluster

SSH into the VM

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages