Metis

Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale

This repository contains the tensorflow implementation for reinforcement learning based Long Running Application scheduling with Hierarchical Reinforcement Learning (HRL).

Dependencies

Python 3.5 or above
Tensorflow 1.12.0
scipy
numpy
pandas
matplotlib
sklearn
Matlab 2019b or above

Installation

Install Requirements

git clone https://github.com/Metis-RL-based-container-sche/Metis.git

Clone from Github source:

pip install -r requirement.txt

Content

Our project includes three parts:

Cluster: Implementation of our seven real-world LRAs that exhibit inter-container interferences.
Experiments: Metis scheduling workflow based on our real-world LRA setting.

Experiment workflow

Real-World LRA cluster setup and profiler establishment

Launch a cluster of tens of nodes on Amazon EC2 consisting of at least 1 manager node, several worker nodes, and some client nodes. Record the public DNS address or ip address of each node and make sure the manager and client nodes are accessible through the SSH key ~/.ssh/id_rsa.

export MANAGER=ec2-xxx-xxx-xxx-100.us-west-2.compute.amazonaws.com
export WORKER1=ec2-xxx-xxx-xxx-101.us-west-2.compute.amazonaws.com
# ......
export CLIENT1=ec2-xxx-xxx-xxx-201.us-west-2.compute.amazonaws.com
# ......

$ ssh ubuntu@$MANAGER
(manager)$ git clone https://github.com/Metis-RL-based-container-sche/Metis.git

If Docker has not been installed, install Docker on each manager, worker, and client node:

(manager)$ sudo Cluster/scripts/install_docker.sh
(worker1)$ sudo Cluster/scripts/install_docker.sh
# ......

Coordinating manager and worker nodes through Docker Swarm to form a Swarm cluster.

$ ssh $MANAGER docker swarm init

# Output
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-1bu27zw7lzh6pnu1l0981bu1m6nqq2pcgk25kovuh565319cah-8480smxu3kp7cj5nfkck2itax 192.168.99.100:2377

Then execute the aforementioned command on each worker.

$ ssh $WORKER1 docker swarm join --token SWMTKN-1-1bu27zw7lzh6pnu1l0981bu1m6nqq2pcgk25kovuh565319cah-8480smxu3kp7cj5nfkck2itax 192.168.99.100:2377

# Output
This node joined a swarm as a worker.

Build docker images of different workloads on the manager node.
```
(manager)$ cd Cluster/workloads
(manager)$ ./build-all.sh
```

Launch certain workloads on certain worker node from the manager node.

(manager)$ cd Cluster/scripts

# Launching 1 container for each workload on WORKER1:
# '0' indicates idle; '1' ~ '7' indicate different workloads.

(manager)$ ./service-launching.sh $WORKER1 0 1 2 3 4 5 6 7

# e.g., launch 3 workload-1, 2 workload-2, and 1 workload-3 container:
# ./service-launching.sh $WORKER2 0 0 1 1 1 2 2 3

Sending requests from client node to the worker node

$ ssh ubuntu@$CLIENT1
(client1)$ git clone https://github.com/Metis-RL-based-container-sche/Metis.git

# export WORKER1=ec2-xxx-xxx-xxx-101.us-west-2.compute.amazonaws.com    
# Send single request for testing
(client1)$ cd Cluster/scripts
(client1)$ curl $WORKER1:8081

# Or pressure the application with locust or other tools, e.g.,
(client1)$ cd Cluster/scripts
(client1)$ python3 parallel_locust.py $WORKER1
# Default log path lies in Cluster/scripts/log

(Optional) Collecting performance benchmark datasets through automatically deploying, profiling, terminating, and re-deploying.

# Here the "0-1-2-3-4-5-6-7" or "0-0-1-1-1-2-2-3" means combination
# of different workloads (as described step 5).
(manager)$ ./profiling-go.sh $WORKER1 "0-1-2-3-4-5-6-7 0-0-1-1-1-2-2-3 0-0-0-0-0-0-0-7 0-0-0-0-1-1-1-7 2-4-4-6-6-6-6-7"

Terminate the swarm cluster on the manager node.
```
(manager)$ docker swarm leave --force
```

Metis: RL model training for the real-world LRA cluster

First check the data collected by Real-World LRA cluster is stored in the folder:
```
$ cd Experiments/
$ ls ./simulator/datasets/
```
```
***_sample_collected.npz 
```
Train sub-schedulers in a 27-node sub-cluster:
```
$ cd Experiments/
$ ./shell/TrainSubScheduler.sh
```
Output: the well-trained sub-scheduler models, as well as corresponding log files will be store in the folder:
```
$ ls ./checkpoint/
```
```
subScheduler_*/  
```
High-level model training based on previously well-trained sub-schedulers.

(0) Check the sub-scheduler models are stored in the folder:
```
$ cd Experiments/
$ ls ./checkpoint/
```
```
subScheduler_*/   
```
Check the container batches data is stored in the folder or create your own batches:
```
$ ls ./data
```
```
batch_set_200.npz batch_set_300.npz batch_set_400.npz
batch_set_1000.npz batch_set_2000.npz batch_set_3000.npz
```
(1) High-level training in a medium-sized cluster of 81 nodes:
```
$ ./shell/RunHighLevelTrainingMedium.sh 200
$ ./shell/RunHighLevelTrainingMedium.sh 300
$ ./shell/RunHighLevelTrainingMedium.sh 400
```
Output: the training log files including the RPS, placement matrix, training time duration .etc will be store in the folder:
```
$ ls ./checkpoint/
```
```
81nodes_*_*/
```
(2) High-level training in a large cluster of 729 nodes:
```
$ ./shell/RunHighLevelTrainingLarge.sh 1000
$ ./shell/RunHighLevelTrainingLarge.sh 2000
$ ./shell/RunHighLevelTrainingLarge.sh 3000
```
Output: the training log files including the RPS, placement matrix, training time duration .etc will be store in the folder:
```
$ ls ./checkpoint/
```
```
729nodes_*_*/
```

Baseline Method: Vanilla RL

Vanilla RL is built directly upon Policy Gradient without our Hierarchical designs.

High-level training in a medium-sized cluster of 81 nodes:
```
$ ./shell/RunVanillaRLMedium.sh 200
```
Output: the training log files including the RPS, placement matrix, training time duration, etc will be store in the folder:
```
$ ls ./checkpoint/
```
```
Vanilla_81_*_*/
```

Baseline Method: Divide-Conquer (DC)

DC Method does not use sub-schedulers. Our code below shows its behaviors in a medium-sized cluster of 81 nodes. Each cluster is hierarchically divided into three subsets.

High-level training in a medium-sized cluster of 81 nodes:
```
$ ./shell/RunDCMedium.sh 200
```
Output: the training log files including the RPS, placement matrix, training time duration, etc will be store in the folder:
```
$ ls ./checkpoint/
```
```
DC_81_*_*/
```

Baseline Method: Medea

Medea is implemented using Matlab, due to its outstanding performance in solving the Integer Linear Programming (ILP) Problem.

Generate the performance-constraints used in Medea:

$ cd Experiments
$ ./shell/GenerateInterference.sh
$ ls

interference_applist.csv interference_rpslist.csv

Run Medea in the folder:
```
$ cd testbed/Medea
$ Matlab Medea.m
```
Output: the scheduling decision log files including the allocation matrix, constraint violations, time duration .etc will be store in the folder.

Baseline Method: Paragon

Paragon is re-implemented in Python. For the sake of fair comparison, we feed it with the full interference matrix information as Medea.

Make sure interference_applist.csv has been generated in former Medea setup:
```
$ ls interference_applist.csv
```
Otherwise, generate the performance-constraints used in Medea:
```
$ cd Experiments
$ ./shell/GenerateInterference.sh
$ ls interference_applist.csv
```
Run Paragon of Medium size or Large size:
```
$ cd Experiments/shell
$ # Medium size
$ ./RunParagonMedium.sh 200
$
$ # Large size
$ ./RunParagonLarge.sh 2000
```
Output: the default output shows the average throughput for each testing group as well as the scheduling latency.

For detailed output including container placement and per-container throughput breakdown for each node, please add -v after each python script:
```
$ cd Experiments
$ python3 ParagonExp.py --batch_set_size 200 --batch_choice 0 --size medium --verbose
```

References

[1] Medea: Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: scheduling of long running applications in shared production clusters. In Proceedings of the Thirteenth EuroSys Conference (EuroSys ’18). Association for Computing Machinery, New York, NY, USA, Article 4, 1–13. DOI:https://doi.org/10.1145/3190508.3190549

[2] Paragon: Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 77–88. DOI:https://doi.org/10.1145/2451116.2451125

[3] Redis: an open source, in-memory data structure store. https://redis.io

[4] Model Server for Apache MXNet. https://github.com/awslabs/mxnet-model-server

[5] Image Super Resolution. https://github.com/idealo/image-super-resolution

[6] Locust: an open source load testing tool. https://locust.io

[7] Yahoo! Cloud Streaming Benchmark: Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. DOI:https://doi.org/10.1145/1807128.1807152

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Cluster		Cluster
Experiments		Experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment_info		environment_info
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metis

Dependencies

Installation

Install Requirements

Clone from Github source:

Content

Experiment workflow

Real-World LRA cluster setup and profiler establishment

Metis: RL model training for the real-world LRA cluster

Baseline Method: Vanilla RL

Baseline Method: Divide-Conquer (DC)

Baseline Method: Medea

Baseline Method: Paragon

References

About

Releases 1

Packages

Contributors 3

Languages

License

lwangbm/Metis

Folders and files

Latest commit

History

Repository files navigation

Metis

Dependencies

Installation

Install Requirements

Clone from Github source:

Content

Experiment workflow

Real-World LRA cluster setup and profiler establishment

Metis: RL model training for the real-world LRA cluster

Baseline Method: Vanilla RL

Baseline Method: Divide-Conquer (DC)

Baseline Method: Medea

Baseline Method: Paragon

References

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages