This template is the first in a series of templates that will guide you through the process of creating a cookbook and running it on TACC systems. From simple ones that run a command to more complex ones that run a Python script using conda or a Jupyter Notebook.
- A GitHub account
- TACC account. If you don't have one, you can request one here.
- To access TACC systems, you should have an allocation.
This template creates a simple Python script that will be used to demonstrate how to run a cookbook on a TACC cluster and obtain the output using a UI. The cookbook will use a CSV file stored on TACC storage and run a Python script that reads it, calculates the average of the values in the first column, and writes the result to a file.
In this case, the file is small for demonstration purposes. However, you can use the same process to analyze large files.
app.json
file: contains the definition of the Tapis application, including the application's name, description, Docker image, input files, and advanced options.Dockerfile
: a Docker image is built from theDockerfile
. The Docker image defines the runtime environment for the application and the files that will be used by the application.run.sh
: contains all the commands that will be executed on the TACC cluster.
One of the goals of the template is to demonstrate how to use the TACC storage system to store the input and output files. So, you should upload the CSV file to the TACC storage system.
- Go to the TACC Portal.
- Click on the "Data Files" tab.
- Click on the "Add +" button.
- Click on the "Upload" button.
- Select the file you want to upload and click
Upload Selected
.
The Dockerfile
is used to create a Docker image that will be used to run the Python script. In this case, the Docker image is created using the microconda
base image, which is a minimal image that contains conda.
For example, the Dockerfile below installs curl
using apt-get
. This is useful if you need to install packages that are not available in conda.
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
The environment.yaml
file is used to define the conda environment that will be used to run the Python script. In this case, the environment.yaml
file contains the dependencies needed to run the Python script.
name: base
channels:
- conda-forge
dependencies:
- python=3.9.1
- pandas=1.2.1
The run.sh
file is used to run the Python script. It activates the conda environment and runs the Python script.
#!/bin/bash
set -xe
cd ${_tapisExecSystemInputDir}
python /code/main.py billing.csv ${_tapisExecSystemOutputDir}/output.txt
The run.sh
has two variables that are used to define the input and output directories. These variables are _tapisExecSystemInputDir
and _tapisExecSystemOutputDir
which are automatically set by the Tapis system.
- _tapisExecSystemInputDir: The directory where the input files are staged
- _tapisExecSystemOutputDir: The directory where the application writes the output files
You can use this repository as a template to create your cookbook. Follow the steps below to create your cookbook.
- Click on the "Use this template" button to create a new repository
- Fill in the form with the information for your new repository
- Clone the repository
- Build the Docker image using the command below
docker build -t cookbook-python .
- Push the Docker image to a container registry
docker tag cookbook-python <your-registry>/cookbook-python
docker push <your-registry>/cookbook-python
Each app has a unique id
and description
. So, you should change these fields to match your app's name and description.
- Download the
app.json
file - Change the values
id
anddescription
fields with the name and description as you wish.
- Go to Cookbook UI
- Click on the "Create Application" button
- Fill in the form with the information from your
app.json
file - Click "Create Application"
- A new application will be created, and you will be redirected to the application's page
- Go to the application's page on the Cookbook UI, if you are not already there
- Click on the "Run" button on the right side of the page. This will open the Portal UI
- Click on the "Select" button to choose the input file
- Click "Run"
- After the job finishes, you can check the output by clicking on the "Output location" link on the job's page
- You will be redirected to the output location, where you can see the output files generated by the job
- Click on a file to see its content. In this case, the file is named
output.txt
- William Mobley - wmobley@tacc.utexas.edu
- Maximiliano Osorio - maxiosorio@gmail.com