Skip to content

Kuigesi/PipelineExecution-Reproducible

Repository files navigation

Reproducible Experiment

To reproduce the evaluation presented in the CS592 paper:

First, login to the cuda server cuda.cs.purdue.edu, make sure you have access to this server

ssh username@cuda.cs.purdue.edu

Then, clone the code from github, and enter the repo directory

git clone https://github.com/Kuigesi/PipelineExecution-Reproducible.git
cd PipelineExecution-Reproducible

Before we run the evaluation, we need to check the GPU usage, our evaluation requires 4 GPUs and will occupy at most 5GB memory per GPU.

To check the GPU usage, run

nvidia-smi

The evaluation in our paper uses the first 4 GPUs (0,1,2,3), to reproduce the evaluation in our paper, make sure the first 4 GPUs are all available and all have sufficient free memory (5GB), then run

bash ./runtest.sh

This will produce the following files:

  • ./benchmark/data/benchmark.csv, which is the collected results of the running time of different parallel settigs.

  • 2 figures ./benchmark/pictures/pipelineparallelruntime.pdf, ./benchmark/pictures/pipelineparallelspeedup.pdf will be plotted to illustrate the runtime and speed up of different parallel seetings.

To check out the generated figures, the pdf file should be transfered to your local computer.

Alternate Experiment

There are total 8 GPUs in cuda.cs.purdue.edu, GPU(0, 1, 2, 3) are TITAN Xp, GPU(4, 5) are GEFORCE GTX TITAN, GPU(6, 7) are Tesla K40c, they have different computation capacities.

If the first 4GPUs (0, 1, 2, 3) are not all available, you can also use different GPUs to perform the experiment, but the result will be very different because GPU(0, 1, 2, 3), GPU(4, 5), GPU(6, 7) have different computation capacities.

You can switch to different GPU devices by providing the GPU's device ID to the script. Data Parallelism on 2GPUs will run on the first 2 given GPUs, other settings will use all 4 given GPUs. You can run

bash ./runtest.sh 4 5 2 3

to run the experiment on GPU4, GPU5, GPU2, GPU3. GPU4 and GPU5 run significantly slower than GPU(0, 1, 2, 3), so the result will be different from the paper.

About

Reproducible Experiment for CS592 Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published