Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Update feature demo examples (#30)
Browse files Browse the repository at this point in the history
* Update feature demo examples.
* Add defaulting for `ignoreK8sSuggestedNodes`.
* Fix sort in `getUsablePhysicalCells`.
  • Loading branch information
abuccts authored Aug 20, 2020
1 parent a8b0aa7 commit df185ee
Show file tree
Hide file tree
Showing 41 changed files with 70 additions and 136 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ HiveD supports multiple job **priorities**. Higher-priority jobs can **[preempt]
5. [Priorities](example/feature/README.md#Guaranteed-Job), [Overuse with Low Priority](example/feature/README.md#Opportunistic-Job), and [Inter-](example/feature/README.md#Inter-VC-Preemption)/[Intra-VC Preemption](example/feature/README.md#Intra-VC-Preemption)
6. [Job (Full/Partial) Gang Scheduling/Preemption](example/feature/README.md#Gang-Scheduling)
7. Fault-Tolerance, [Bad Hardware Awareness](example/feature/README.md#Bad-Hardware-Awareness), [Work-Preserving Reconfiguration](example/feature/README.md#Work-Preserving-Reconfiguration)
8. [Leverage K8S Default Scheduler](example/feature/README.md#Leverage-K8S-Default-Scheduler)

## Prerequisite
1. A Kubernetes cluster, v1.14.2 or above, on-cloud or on-premise.
Expand Down
43 changes: 15 additions & 28 deletions example/feature/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ HiveD guarantees **quota safety for all VCs**, in the sense that the requests to

VC's cells can be described by Hardware Quantity, [Topology](#VC-Safety), [Type](#SKU-Type), [Pinned Cells](#Pinned-Cells), etc. To guarantee safety, HiveD never allows a VC to "invade" other VCs' cells. For example, to guarantee all VCs' topology, one VC's [guaranteed jobs](#Guaranteed-Job) should never make fragmentation inside other VCs:

Two DGX-2s, two VCs each owns one DGX-2 node. For a traditional scheduler, this will translate into two VCs each owning 16 GPUs. When a user submits 16 1-GPU jobs to VC1, the user in VC2 might not be able to run a 16-GPU job, due to possible fragmentation issue caused by VC1. While HiveD can guarantee each VC always has one entire node available for its dedicated use.
Two DGX-2s, two VCs each owns one DGX-2 node. For a traditional scheduler, this will translate into two VCs each owning 16 GPUs. When a user submits 16 1-GPU jobs to vc1, the user in vc2 might not be able to run a 16-GPU job, due to possible fragmentation issue caused by vc1. While HiveD can guarantee each VC always has one entire node available for its dedicated use.

### Reproduce Steps
1. Use [hived-config-1](file/hived-config-1.yaml).
Expand All @@ -27,7 +27,7 @@ This is similar to [K8S Taints and Tolerations](https://kubernetes.io/docs/conce

### Reproduce Steps
1. Use [hived-config-8](file/hived-config-8.yaml).
2. Submit job [itc-pin](file/itc-pin.yaml) to VC1, all tasks in task role vc1pinned will be on node 10.151.41.25 (which is pinned), all tasks in task role vc1nopinned will NOT be on node 10.151.41.25.
2. Submit job [itc-pin](file/itc-pin.yaml) to vc1, all tasks in task role vc1pinned will be on node 10.151.41.25 (which is pinned), all tasks in task role vc1nopinned will NOT be on node 10.151.41.25.
<img src="file/itc-pin.png" width="900"/>

## SKU Type
Expand Down Expand Up @@ -68,8 +68,8 @@ This is useful for jobs that cannot perform any useful work, such as making prog
<img src="file/itc-gang4.png" width="900"/>

#### TensorFlow Distributed Training
1. Use [hived-config-1](file/hived-config-1.yaml).
2. Submit job [itc-dtf](file/itc-dtf.yaml) to VC2, it will success.
1. Use [hived-config-2](file/hived-config-2.yaml).
2. Submit job [itc-dtf](file/itc-dtf.yaml) to default VC, it will success.
<img src="file/itc-dtf.png" width="900"/>

## Incremental Scheduling
Expand Down Expand Up @@ -110,27 +110,28 @@ Within one VC, a high-priority job can preempt low-priority jobs.
### Reproduce Steps
#### Immediate Preemption
1. Use [hived-config-3](file/hived-config-3.yaml).
2. Submit [itc-intra-imd-preempt-test](file/itc-intra-imd-preempt-test.yaml), which requests for 4 M60 GPUs for VC1 with test (0) priority.
3. Submit [itc-intra-imd-preempt-prod](file/itc-intra-imd-preempt-prod.yaml), which also requests for 4 M60 GPUs for VC1 with prod (100) priority. The job will preempt the test job immediately, so the test job is retried and waiting for resource.
2. Submit [itc-intra-imd-preempt-test](file/itc-intra-imd-preempt-test.yaml), which requests for 4 M60 GPUs for vc1 with test (0) priority.
3. Submit [itc-intra-imd-preempt-prod](file/itc-intra-imd-preempt-prod.yaml), which also requests for 4 M60 GPUs for vc1 with prod (100) priority. The job will preempt the test job immediately, so the test job is retried and waiting for resource.
<img src="file/itc-intra-imd-preempt-test.png" width="900"/>
<img src="file/itc-intra-imd-preempt-prod.png" width="900"/>

#### Lazy Preemption
1. Use [hived-config-3](file/hived-config-3.yaml).
2. Submit [itc-intra-lazy-preempt-test](file/itc-intra-lazy-preempt-test.yaml), which requests for 4 K80 GPUs for VC1 with test (0) priority.
3. Submit [itc-intra-lazy-preempt-prod](file/itc-intra-lazy-preempt-prod.yaml), which also requests for 4 K80 GPUs for VC1 with prod (100) priority. The job will just downgrade the test job to be [Opportunistic Job](#Opportunistic-Job), instead of preempting it immediately, because all jobs can still fit into the whole physical cluster.
2. Submit [itc-intra-lazy-preempt-test](file/itc-intra-lazy-preempt-test.yaml), which requests for 4 K80 GPUs for vc1 with test (0) priority.
3. Submit [itc-intra-lazy-preempt-prod](file/itc-intra-lazy-preempt-prod.yaml), which also requests for 4 K80 GPUs for vc1 with prod (100) priority. The job will just downgrade the test job to be [Opportunistic Job](#Opportunistic-Job), instead of preempting it immediately, because all jobs can still fit into the whole physical cluster.
4. Submit [itc-intra-lazy-preempt-prod2](file/itc-intra-lazy-preempt-prod2.yaml), which also requests for 3 * 4 K80 GPUs for default VC with prod (100) priority. The job will preempt the test job immediately, because all jobs cannot fit into the whole physical cluster.
<img src="file/itc-intra-lazy-preempt-test.png" width="900"/>
<img src="file/itc-intra-lazy-preempt-prod.png" width="900"/>
<img src="file/itc-intra-lazy-preempt-prod2.png" width="900"/>
> NOTE: `lazyPreemptionEnable` option is disabled by default, becasue earlier job may be downgraded to low priority job and get preempted by later jobs, which may be confusing.
## Inter-VC Preemption
### Description
One VC's [Guaranteed Job](#Guaranteed-Job) can preempt other VCs' [Opportunistic Jobs](#Opportunistic-Job).

### Reproduce Steps
1. Use [hived-config-2](file/hived-config-2.yaml).
2. Submit [itc-inter-preempt-oppo](file/itc-inter-preempt-oppo.yaml), which requests for 2 * 4 K80 GPUs for VC1 with oppo (-1) priority.
1. Use [hived-config-3](file/hived-config-3.yaml).
2. Submit [itc-inter-preempt-oppo](file/itc-inter-preempt-oppo.yaml), which requests for 2 * 4 K80 GPUs for vc1 with oppo (-1) priority.
3. Submit [itc-inter-preempt-prod](file/itc-inter-preempt-prod.yaml), which also requests for 3 * 4 K80 GPUs for default VC with prod (100) priority. The job will preempt the oppo job immediately.
<img src="file/itc-inter-preempt-oppo.png" width="900"/>
<img src="file/itc-inter-preempt-prod.png" width="900"/>
Expand Down Expand Up @@ -190,20 +191,20 @@ HiveD can be reconfigured without unnecessary user impacts, such as add/update/d
#### VirtualCluster Reconfig - Delete VirtualCluster
1. Use [hived-config-2](file/hived-config-2.yaml).
2. Submit job [itc-reconfig-3](file/itc-reconfig-3.yaml) to default VC. Wait until it is running.
3. Delete the default VC and move its quota to VC1, then becomes [hived-config-5](file/hived-config-5.yaml).
3. Delete the default VC and move its quota to vc1, then becomes [hived-config-5](file/hived-config-5.yaml).
4. Use [hived-config-5](file/hived-config-5.yaml), and restart HiveD.
5. The job will still run without any interruption but [lazy preempted](#Lazy-Preemption) by HiveD.
<img src="file/itc-reconfig-3.png" width="900"/>
6. To confirm it is [lazy preempted](#Lazy-Preemption), submit job [itc-reconfig-4](file/itc-reconfig-4.yaml) to VC1 which requests all K80 nodes. The job will immediately preempt [itc-reconfig-3](file/itc-reconfig-3.yaml).
6. To confirm it is [lazy preempted](#Lazy-Preemption), submit job [itc-reconfig-4](file/itc-reconfig-4.yaml) to vc1 which requests all K80 nodes. The job will immediately preempt [itc-reconfig-3](file/itc-reconfig-3.yaml).
<img src="file/itc-reconfig-4.png" width="900"/>

#### VirtualCluster Reconfig - Update VirtualCluster
1. Use [hived-config-2](file/hived-config-2.yaml).
2. Submit job [itc-reconfig-3](file/itc-reconfig-3.yaml) to default VC. Wait until it is running.
3. Move one K80-NODE cell from default VC to VC1, then becomes [hived-config-6](file/hived-config-6.yaml).
3. Move one K80-NODE cell from default VC to vc1, then becomes [hived-config-6](file/hived-config-6.yaml).
4. Use [hived-config-6](file/hived-config-6.yaml), and restart HiveD.
5. The job will still run without any interruption but [lazy preempted](#Lazy-Preemption) by HiveD.
6. To confirm it is [lazy preempted](#Lazy-Preemption), submit job [itc-reconfig-5](file/itc-reconfig-5.yaml) to VC1 which requests all K80 nodes. The job will immediately preempt [itc-reconfig-3](file/itc-reconfig-3.yaml).
6. To confirm it is [lazy preempted](#Lazy-Preemption), submit job [itc-reconfig-5](file/itc-reconfig-5.yaml) to vc1 which requests all K80 nodes. The job will immediately preempt [itc-reconfig-3](file/itc-reconfig-3.yaml).
<img src="file/itc-reconfig-5.png" width="900"/>

## Bad Hardware Awareness
Expand All @@ -219,17 +220,3 @@ Avoid scheduling pods to bad hardware.
4. Bring back 10.151.41.26 by `sudo systemctl start kubelet`. Wait until this is detected by K8S.
5. The waiting job will start running, without any retries.
<img src="file/itc-badnode50-3.png" width="900"/>

## Leverage K8S Default Scheduler
### Description
You can still leverage almost all scheduling features provided by your existing [K8S Default Scheduler](https://kubernetes.io/docs/concepts/scheduling/kube-scheduler) with HiveD, such as these [Filtering Policies](https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/#filtering).

### Reproduce Steps
#### Leverage [Labels and Selectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels)
1. Use [hived-config-2](file/hived-config-2.yaml).
2. Remove PAI worker label for 10.151.41.26 (the only M60 node).
3. Submit job [itc-no-worker-label](file/itc-no-worker-label.yaml), which requests M60 node, it will be waiting without IP associated.
<img src="file/itc-no-worker-label-1.png" width="900"/>
4. Add back PAI worker label for 10.151.41.26.
5. The waiting job will start running, without any retries.
<img src="file/itc-no-worker-label-2.png" width="900"/>
4 changes: 2 additions & 2 deletions example/feature/file/hived-config-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ physicalCluster:
- cellAddress: 10.151.41.24

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
VC2:
vc2:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ physicalCluster:
- cellAddress: 10.151.41.26

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 1
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ physicalCluster:
- cellAddress: 10.151.41.26

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 1
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-33.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ physicalCluster:
# - cellAddress: 10.151.41.25

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 1
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ physicalCluster:
- cellAddress: 10.151.41.25

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 1
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-5.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ physicalCluster:
- cellAddress: 10.151.41.26

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 4
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-6.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ physicalCluster:
- cellAddress: 10.151.41.26

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 3
Expand Down
2 changes: 1 addition & 1 deletion example/feature/file/hived-config-7.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ physicalCluster:
- cellAddress: 10.151.41.26

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 1
Expand Down
4 changes: 2 additions & 2 deletions example/feature/file/hived-config-8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@ physicalCluster:
- cellAddress: 10.151.41.24

virtualClusters:
VC1:
vc1:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
pinnedCells:
- pinnedCellId: VC1-K80
VC2:
vc2:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
Expand Down
10 changes: 4 additions & 6 deletions example/feature/file/itc-badnode50.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,21 @@ taskRoles:
instances: 1
completion:
minFailedInstances: 1
minSucceededInstances: 6
minSucceededInstances: 1
dockerImage: keras_tensorflow_example
resourcePerInstance:
cpu: 4
memoryMB: 8192
gpu: 1
commands:
- nvidia-smi -L
- printenv
- sleep 10000
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
gangAllocation: true
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: M60
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-buddy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,13 @@ taskRoles:
commands:
- nvidia-smi -L
- printenv
- sleep 10000
- sleep 10m
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
gangAllocation: true
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
2 changes: 0 additions & 2 deletions example/feature/file/itc-dtf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -100,5 +100,3 @@ deployments:
- echo "Uploading data ..."
defaults:
deployment: tf_example
extras:
submitFrom: submit-job-v2
3 changes: 1 addition & 2 deletions example/feature/file/itc-elastic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,11 @@ taskRoles:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
gangAllocation: false
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
3 changes: 1 addition & 2 deletions example/feature/file/itc-gang.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,11 @@ taskRoles:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
gangAllocation: true
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-gang4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ taskRoles:
instances: 4
completion:
minFailedInstances: 1
minSucceededInstances: 6
minSucceededInstances: 4
dockerImage: keras_tensorflow_example
resourcePerInstance:
cpu: 4
Expand All @@ -21,12 +21,11 @@ taskRoles:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
gangAllocation: true
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-inter-preempt-oppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,12 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
- sleep 10000
- sleep 10m
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
hivedScheduler:
jobPriorityClass: oppo
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
3 changes: 1 addition & 2 deletions example/feature/file/itc-inter-preempt-prod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
- sleep 10000
- sleep 10m
defaults:
virtualCluster: default
extras:
Expand All @@ -29,4 +29,3 @@ extras:
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-intra-imd-preempt-prod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,12 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
- sleep 10000
- sleep 10m
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: M60
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-intra-imd-preempt-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,12 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
- sleep 10000
- sleep 10m
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
hivedScheduler:
jobPriorityClass: test
taskRoles:
train:
skuType: M60
submitFrom: submit-job-v2
5 changes: 2 additions & 3 deletions example/feature/file/itc-intra-lazy-preempt-prod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,12 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
- sleep 10000
- sleep 10m
defaults:
virtualCluster: VC1
virtualCluster: vc1
extras:
hivedScheduler:
jobPriorityClass: prod
taskRoles:
train:
skuType: K80
submitFrom: submit-job-v2
Loading

0 comments on commit df185ee

Please sign in to comment.