Dedalus is a flexible differential equations solver using spectral methods. It is MPI-parallelized and therefore can make efficient use of high performance computing resources like the NYU Greene Cluster. The cluster uses Singularity containers to manage packages and Slurm for job scheduling. It is tricky to construct and use a Singularity container for Dedalus v3 that interacts with these well. Luckily, the NYU HPC staff has figured out much of the details. This note describes how to use the Singularity, on single node, on multiple nodes, and in JupyterLab. At the end we will also biefly comment on how to build the Singularity so that you can build your customized version.
Note: If anything does not work in this note, or that you have run into trouble, please let me know at my email ryan_sjdu@nyu.edu. I would be happy to help.
- Using Dedalus on a single node
- Using Dedalus on multiple nodes
- Testing performance
- Building the Singularity
- Acknowledgment
We first use Dedalus on the command line. On the command line (compared to using JupyterLab) Dedalus can use multiple cores (but still on a single node) to improve speed. Simulations that involves heavy computation should use this method.
Once we are logged into the Greene cluster, cd
into your scratch directory and request a computing node so that we can run some code for testing (do not run CPU heavy jobs in the log-in node)
cd $SCRATCH
srun --nodes=1 --tasks-per-node=4 --cpus-per-task=1 --time=2:00:00 --mem=4GB --pty /bin/bash
Once we are in, paste the following commands to start the already-made singularity
singularity exec \
--overlay /scratch/work/public/singularity/dedalus-3.0.0a0-openmpi-4.1.2-ubuntu-22.04.1.sqf:ro \
/scratch/work/public/singularity/ubuntu-22.04.1.sif /bin/bash
unset -f which
source /ext3/env.sh
export OMP_NUM_THREADS=1; export NUMEXPR_MAX_THREADS=1
The last command essentially turns off any shared parallelism. But this is recommended for Dedalus's performance since Dedalus does not use hybrid parallelism (see Dedalus documentation on Disable multithreading). We can check they indeed worked by running echo $OMP_NUM_THREADS; echo $NUMEXPR_MAX_THREADS
and we should get 1 1
.
We could now run an example script, e.g.: the Rayleigh-Benard convection (2D IVP) example. We clone this from the Dedalus GitHub repo. To avoid downloading a lot of files, we use sparse checkout.
git clone --depth 1 --filter=blob:none --sparse https://github.com/DedalusProject/dedalus.git
cd dedalus/
git sparse-checkout set examples
cd examples/ivp_2d_rayleigh_benard
Now we can run the example. Note that we requested 4 cores and are using 4 MPI processes, these two numbers should be the same.
mpiexec -n 4 python3 rayleigh_benard.py
mpiexec -n 4 python3 plot_snapshots.py snapshots/*.h5
We now see the script outputting time-stepping information. And if we look at the CPU usage in the node using htop -u ${USER}
, we should see near 100% usage on 4 cores. Satisfying.
Note that we did not use the Dedalus provided test python3 -m dedalus test
. This is intentional. The test function does not work consistently with our setup. But obviously, we have a working Singularity given that we can run Dedalus examples.
All the above commands are wrapped up in a script that we can just call. We can take a peak at the script via
cat /scratch/work/public/singularity/run-dedalus-3.0.0a0.bash
Now to run the same Dedalus code, we can just enter this command in the log-in node:
srun --nodes=1 --tasks-per-node=4 --cpus-per-task=1 --time=2:00:00 --mem=4GB \
/scratch/work/public/singularity/run-dedalus-3.0.0a0.bash python rayleigh_benard.py
To run many heavy simulations, one should queue the jobs in Greene by using Slurm scripts. Here is the general tutorial for Slurm on Greene. In this section, we will run an example Slurm Dedalus job.
In the repository of this note, there is an example Slurm script that runs the Periodic shear flow (2D IVP) exmple. To use it, we first clone this repo
cd $SCRATCH
git clone https://github.com/CAOS-NYU/Dedalusv3_GreeneSingularity.git
cd Dedalusv3_GreeneSingularity
We can take a look at the script
cat slurm_example_singlenode.SBATCH
You need to fill in your NYU ID to use this script. We see that the script contains the same srun
commands used in the interactive command line case. Nothing mysterious.
Now we submit the script
sbatch slurm_example.SBATCH
and check the queue
squeue -u ${USER}
Now we see it in the queue. We can check the reasons for the wait here. After some patience and the job runs (you will receive an email when it is done), we can check the output in the Dedalusv3_GreeneSingularity
directory. There should be a file named slurm_<yourjobid>.out
that contains the terminal output. We can also find the data output in the code folder
$SCRATCH/dedalus/examples/ivp_2d_shear_flow
ls snapshots
Sometimes it is convenient to use JupyterLab for code development. Note that for Dedalus, running it in JupyterLab means we can use only one core. This is acceptable if the computation is light. We should only request one core because more will be wasteful. (Note: you can run mipexec
in Jupyter but I think then one should just use the command line.)
The instruction on using Open OnDemand (OOD) with Conda/Singularity for Greene is available here. Since we have an already-made Singularity, we can skip most of the steps.
We create a kernel named dedalus3
by copying my files to your home directory.
mkdir -p ~/.local/share/jupyter/kernels
cd ~/.local/share/jupyter/kernels
cp -R /scratch/work/sd3201/dedalus3/dedalus3 ./dedalus3
cd ./dedalus3
ls
#kernel.json logo-32x32.png logo-64x64.png python
#files in the ~/.local/share/jupyter/kernels directory
After this, we can enjoy Dedalus in Jupyter on OOD by following this tutorial. Remember to request only one core because we can only use one!
To learn about the details of the files you copied, you could read the python
and kernel.json
files. The Singularity used is mine. For instructions on how to make your own, see the section on buiding your own Singularity.
Since Dedalus uses MPI, we could use multiple nodes for our computation. In Greene, requesting multiple nodes and lauching python in each nodes is managed by Slurm via srun
. Therefore, we cannot use multiple nodes for interactive jobs. Therefore, we start with
In a log-in node, run
srun --nodes=4 --tasks-per-node=4 --cpus-per-task=1 --time=2:00:00 --mem=4GB \
/scratch/work/public/singularity/run-dedalus-3.0.0a0.bash python rayleigh_benard.py
Because we have to disable multithreading, we should keep --cpus-per-task=1
.
After some wait for the job to start, we should see the code running. We can see four nodes used via
squeue -u $USER
In each node, there are four CPU cores used, all near 100%. Nice.
It is straightforward to convert the above command into a slurm script. We provide an example in this repo.
Please see the drag_race folder for some performance tests of Dedalus on Greene. The tests shows our set-ups are working well.
We will build the Singularity by first following the standard steps. For installing Dedalus, we will by building from source. First run
mkdir $SCRATCH/dedalus_sing
cd $SCRATCH/dedalus_sing
cp -rp /scratch/work/public/overlay-fs-ext3/overlay-1GB-400K.ext3.gz .
gunzip overlay-1GB-400K.ext3.gz
The launch the Singularity
singularity exec \
--overlay overlay-1GB-400K.ext3 \
/scratch/work/public/singularity/ubuntu-22.04.1.sif /bin/bash
Inside the Singularity, install miniconda
bash /share/apps/utils/singularity-conda/setup-conda.bash
source /ext3/env.sh
Then clone the Dedalus source code
cd /ext3
git clone https://github.com/DedalusProject/dedalus.git
cd /ext3/dedalus
and build and install Dedalus
CC=mpicc \
MPI_INCLUDE_PATH=/usr/lib/x86_64-linux-gnu/openmpi/include \
MPI_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/openmpi \
FFTW_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu \
FFTW_INCLUDE_PATH=/usr/include \
python3 -m pip install --no-cache .
At this stage, we can add more packages to the Singularity. For example, we could add cmocean, a beautiful colormap package to the existing Singularity.
pip install cmocean
To test that we indeed have the package, run
source /ext3/env.sh
python -c "import cmocean; print(cmocean.__version__); print(cmocean.__file__)"
#v3.0.3
#/ext3/miniconda3/lib/python3.10/site-packages/cmocean/__init__.py
#your package should be here, not .local
Now you have your own Dedalus Singularity that you can edit. You could replace /scratch/work/public/singularity/dedalus-3.0.0a0-openmpi-4.1.2-ubuntu-22.04.1.sqf
in this note with $SCRATCH/dedalus_sing/overlay-1GB-400K.ext3
. If you want to share your Singularity, run inside the Singularity
mksquashfs /ext3 dedalus_readonly.sqf -keep-as-directory
to make a read-only version. You should not let others read your ext3
file: read access means write access for ext3
files!
The version I use will be available at /scratch/work/sd3201/dedalus3/dedalus_ryansingularity.sqf
, if you want to use my version. A keen reader might have already realized the Singularity used for the JupyterLab is my version.
The Singularity files in this note are made by Shenglong Wang on the NYU HPC team. We thank the NYU HPC team for their help in training and troubleshooting.