PLEASE REDIRECT TO DEMO FOLDER (readMe.md) FOR A SAMPLE DEMO OF THE PROJECT.
Researchers use high-performance computing (HPC) cyberinfrastructures (CI) like the Ohio Supercomputer (OSC) or Texas Advanced Computing Center (TACC) to execute computationally intensive diverse scientific workflows. Some workflows are heavy on IO, like genome sequencing (cleaning and assembly), while others, like training DNNs, could be compute (and memory) intensive. Each workflow has a unique resource requirement, and it is essential to profile and understand these needs to allocate shared resources for optimal utilization of the cyberinfrastructure. These resources are expensive, and several jobs compete to get these allocations, sometimes with reasonable wait times (while requesting enormous resources for a long time). Estimating the expected resources for optimally utilizing the compute and memory is challenging, especially considering the need for sufficient history to enable these predictions tailored for unique workflows and execution environments. We explored and established a framework (as shown in Figure 1) that pipelines the solutions to address these challenges. The Framework is configured to generate a history of executions and train suitable regression models to estimate the approximate execution time for a targeted application.
Figure 1: The Proposed Framework: training data generation, building regression models, selecting the best model based on custom criteria- Generating and Preparing Training Data: This module automatically and systematically generates comprehensive, diverse "scaled-down(SD)" and limited, selective "full-scale(FS)" runs with minimal human intervention. We use Cheetah (https://github.com/CODARcode/cheetah) to execute the target application with the pre-defined data generation configurations (SD and FS) to generate the history-of-runs training data.
- Building Regression Models: This module standardizes and prepares the data, trains the selected off-the-shelf regression models with the appropriate hyper-parameters, and stores them for inference. In this phase, the data generated in the first phase is processed to train regression models. Redundant features are eliminated, outliers are removed, and features are transformed to reduce the dimensionality before training the regression models.
- Selecting Appropriate Prediction Model: This module selects the most appropriate regression model from a pool of trained models from phase 2 with respect to a given policy and target application Note: The Framework is built on TensorFlow Framework.
- Configure HARP with TAPIS to execute HARP and applications to profile as container images.
- Install HARP as a loadable module on OSC or localbox.
NOTE: IMPORTANT INFORMATION - PLEASE READ
1. TAPIS executes only containerized applications, so HARP (HARP framework) and the Application should be containerized.
2. Profiling an application HARP Framework with TAPIS is tested on both TACC and OSC systems.
3. Colab Notebooks server as examples for running HARP profiling for a sample Euler application using TAPIS. Make a COPY of the colab or download the Notebooks to execute them.
4. HARP Framework and example containers could be executed on local box user docker or apptaner servers without TAPIS integration.
Follow through the Colab notebooks for profiling the HARP Container application using TAPIS:
- For TACC stampede2 example use: https://drive.google.com/file/d/1JyAHUxxZ3pKMXGs28UXMQJn5QZmti1yS/view?usp=sharing
- For OSC pitzer example use:https://drive.google.com/file/d/1w8qCTWiOjvn8CCx6FqvZzG8ZJBRKy4M3/view?usp=sharing Alternatively, download the example notebook from the "Notebooks" folder
Steps:
- 1.Create a new HARP Image [or] using the pre-made HARP Image
- 2.Create an image for the application to be profiled using HARP Framework (Image)
- 3.Refer to Section "Using HARP to profile an application and predict the execution time" for steps to execute the container in localbox or a CI (like OSC or TACC nodes) with or without TAPIS.
- Create a HARP Image using Docker Environment
a. Use the 'Dockerfile_HARP_local' file to create an Image for executing the Framework on a local box using 'docker build'.
docker build -f DockerFiles/Dockerfile_HARP_local -t harp-framework-local:2.0.0 .
b. Use the Dockerfile_HARP_CI file to create an Image for executing the Framework on CI (like TACC or OSC) using 'docker build'.
docker build -f DockerFiles/Dockerfile_HARP_CI -t harp-framework-ci:2.0.0 .
c. Push the Image to the docker hub or upload it to any web-accessible location.
Push the image to a web-accessible location like dockerhub using the following command:
docker push <DockerHub>/harp-framework-[local|ci]:2.0.0
If you already have access to the pre-compiled images, use these copies of the HARP image from our repository a. Use the harp-framework-local:2.0.0 file for executing the Framework on a local box. b. Use the harp-framework-ci:2.0.0 file for executing the Framework on TACC or OSC Systems.
Note: Execute the 'docker build' command from the main folder 'harp'.
- Create an image for the application to be profiled using HARP Framework (Image)
Using the Image created in "Step 1" or the respective Image from the ICICLE repository, create an Image for the application to be profiled using the HARP Framework. We refer to the application to be profiled as a 'target application'.
Steps for creating a DockeFile for the target application and building an image using the 'Dockerfile_App_Template' template.
a. Edit the 'ProfileApplication.sh' entry point file to execute the HARP with the pipeline configurations JSON. Replace "pipeline_config.json" with your desired pipeline configuration file in the application work folder.
harp pipeline_config.json
b. Create a DockerFile for the target application from the following template
FROM <web-accessible-path-to-image-hub>/harp-framework-[local|ci]:2.0.0
# 1. Add application required installations.
# none for Euler example
# 2. Target application setup
# Set the APP_PATH to the location with your application work folder (target application to be profiled) and APP_NAME to the work folder name
ENV APP_PATH="<path-to-target-application>/<target-application-work-folder>"
ENV APP_NAME="<target-application-work-folder>"
# Add the target application work folder to the Image
ADD $APP_PATH /app/$APP_PATH
# 3. Copy the execution endpoint file and set it
COPY DockerFiles/ProfileApplication.sh /app/ProfileApplication.sh
ENTRYPOINT ["sh", "/app/ProfileApplication.sh"]
c. Build the application Image (these commands show building an image for an example application 'Euler Number') a. Use the 'Dockerfile_App_EulerNumber' file to create an Image for executing the Framework on a local box using 'docker build'.
docker build -f DockerFiles/Dockerfile_App_EulerNumber -t harp-app-eulernumber-[local|ci]:2.0.0 .
b. Push the Image to the docker hub or upload it to any web-accessible location.
Push the image to a web-accessible location like dockerhub using the following command:
docker push <DockerHub>/harp-app-eulernumber-[local|ci]:2.0.0
Use an existing copy of the Euler Number application image from our repository.
- Refer to Section "Using HARP to profile an application and predict the execution time" for steps to execute the container in localbox or a CI (like OSC or TACC nodes) with or without TAPIS.
- Dependency: Linux, Python 3.9+, git, pip, mpich, psutil, jq(command line JSON parser https://stedolan.github.io/jq/)
- On supercomputers (OSC), it should be installed at a location accessible from the parallel file system
git clone https://github.com/ICICLE-ai/harp.git
cd harp
chmod 755 install-osc-harp.sh
./install-osc-harp.sh
If the installation fails, please re-run the script 'install-osc-harp.sh' after deleting the environment 'harp-env' and running the cleanup.sh in the install directory.
conda remove --name harp-env --all
./cleanup.sh
This setup installs miniconda, CODAR Cheetah (https://github.com/CODARcode/cheetah), TensorFlow, psutil, pandas, and scikit-learn and configures the Harp framework. Please follow the installation prompts to go ahead with the setup. This installation takes 30-40 mins to finish the setup on Owens login node.
module use $HOME/osc_apps/lmodfiles
module load harp
export CONDA_HOME=<path_to_miniconda>/miniconda3
source $CONDA_HOME/bin/activate
source activate harp_env
NOTE
Things to consider while installing the Framework on OSC
- [OSC Installation] The installer creates a conda environment, "harp_env" on OSC and uses this environment to execute the Framework. The environment name is used in a couple of Cheetash configurations and hence is mandated to use the same name, "harp_env," while installing the application. Please delete the environment if it already exists with this name before installing the Framework.
- Upon successful installation, the install script will return the below response: (OSC Install Script) Generating Module File Step: /users/PAS0536/swathivm/osc_apps/lmodfiles/harp/1.0.lua (OSC Install Script) Generating Module File Step Finished Finished at Thu Mar 16 11:44:13 EDT 2023 Execution time: 1965 seconds
- Use these commands to install the dependencies using pip
pip install psutil pip install tensorflow pip install pandas pip install scikit-learn
- Download the source code into the and set it to HARP_HOME
git clone https://github.com/ICICLE-ai/harp.git export HARP_HOME=<path-to-download-folder>/harp
- Install Cheetah
cd $HARP_HOME/cheetah pip install --editable .
- Ensure the scripts have 'execute' privileges
cd $HARP_HOME/pipeline/bin/local chmod 755 harp cd $HARP_HOME/cheetah/bin chmod 755 *
- Set the HARP pipeline and Cheetah binaries in the PATH
export PATH=$HARP_HOME/pipeline/bin/local:$HARP_HOME/cheetah/bin:$PATH
The HARP pipeline is ready to be used once the HARP_HOME and binaries are set in PATH.
NOTE HARP has been tested on Ownes and Pitzer (OSC) and a standalone Linux system.
Things to consider while installing the dependencies on standalone Linux systems:
- if you do not have root or admin privileges, please consult your package manager on installing mpich and operator dependencies.
Using the HARP (version 2.0.0) to profile an application (e.g., Euler Number) on local box and CIs (OSC and TACC) using TAPIS
a. To profile the application, execute the application image built from the HARP parent image using the following commands on the localbox:
[optional] docker pull ghcr.io/icicle-ai/harp-app-eulernumber-local:2.0.0
docker run --mount source=HARP_Store,target=/scratch ghcr.io/icicle-ai/harp-app-eulernumber-local:2.0.0
b. To profile the application using HARP on OSC or TACC
i. Without TAPIS: Login into a compute node on OSC or TACC and run the following:
module load singularity
[optional] singularity pull docker://ghcr.io/icicle-ai/harp-app-eulernumber-ci:2.0.0
singularity run docker://ghcr.io/icicle-ai/harp-app-eulernumber-ci:2.0.0 osc /fs/scratch/PAS2271/swathivm/
module load tacc-apptainer
[optional] singularity pull docker://ghcr.io/icicle-ai/harp-app-eulernumber-ci:2.0.0
singularity run docker://ghcr.io/icicle-ai/harp-app-eulernumber-ci:2.0.0 tacc none
ii. With TAPIS: Follow the instructions in the 'Executing_HARP_using_TAPIS.ipynb' notebook in the 'Notebooks' folder to register OSC and TACC systems on TAPIS and profile the application 'harp-app-eulernumber-ci:2.0.0' using HARP
- Navigate to the target application folder and copy all the files from /Post_Execution_Scripts/basic into the current folder.
- Edit path in post-script.sh to point to the target application directory
- Execute the Framework as per the configurations in file 'train_config.json' as follows:
cd <path_to_application>
chmod 755 *
harp <pipeline-configration>.json
- The results of the Framework are stored in the predictions.json file under the target application folder. Please find the sample application under the example folder and follow the readme file to execute the Framework against profiling and estimating the resource needs.
System Name | Version V1 Support | Version V2 Support | CPU | GPU |
---|---|---|---|---|
Local Linux machines | ✅ | ✅ | ✅ | |
Owens (OSC) | ✅ | ✅ | ✅ | ✅ |
Pitzer (OSC) | ✅ | ✅ | ✅ | ✅ |
Frontera (TACC) | ✅ | ✅ | ✅ | |
Stampede2 (TACC) | ✅ | ✅ | ✅ |
The HARP is licensed under the https://opensource.org/licenses/BSD-3-Clause