-
Notifications
You must be signed in to change notification settings - Fork 15
Tutorial
This tutorial is provided as jupyter notebook (and mirrored here for easier access and initial instructions).
The tutorial itself is part of the Towards exascale-ready astrophysics workshop https://indico3-jsc.fz-juelich.de/event/169/ taking place 25-27 Sep 2024 online.
The tutorial has been prepared by Philipp Grete pgrete@hs.uni-hamburg.de and is licensed under the BSD 3-Clause License.
#%matplotlib inline
# The PR for the Parthenon frontend in yt is still open, so we need to use a custom version, see https://github.com/parthenon-hpc-lab/athenapk?tab=readme-ov-file#data-analysis
# To accelerate the process, the package is already installed for the training but we need to make it known.
import sys
sys.path.insert(0, "/p/project1/training2437/tera_day1/yt")
import yt
yt.set_log_level(50) # disable logging
import matplotlib.pylab as plt
import numpy as np
import os
user = os.environ['USER']
from matplotlib.animation import FuncAnimation
from IPython.display import Video
Follow instructions on how to get access https://gitlab.jsc.fz-juelich.de/hedgedoc/s/J-O7yP6hD#
Log in to https://jupyter-jsc.fz-juelich.de/
Fire up a lab environment (ideally do this during the coffee break as it might take some time) with
- Lab Config:
- System:
JURECA
- Project:
training2437
- Partition:
dc-gpu
- System:
- Resources (opens once
dc-gpu
is selected)- Nodes:
1
- GPUs:
4
- Runtime:
120
(or more if you want to continue after the offical timeslot
- Nodes:
- Kernels and extensions: keep defaults
Once started open a terminal (which will run directly on the compute node) and execute the following to get the tutorial material
$ source /p/project1/training2437/tera_day1/setup_env.sh
Finally, navigate to the tutorial folder (/p/project1/training2437/$USER/tera_day1/athenapk_tutorial
) on jupyter lab and open the notebook that's in the folder.
Is machine dependent and (especially for device/GPU) might take some time. Binaries to be run as part of the TERA24 tutorial were already prepared as follows.
$ module load Stages/2024 NVHPC OpenMPI CMake HDF5 Python Ninja
$ module load MPI-settings/CUDA
$ git clone https://github.com/parthenon-hpc-lab/athenapk.git
$ cd athenapk
# this will get the Parthenon and Kokkos sources that are build automatically
$ git submodule update --init
$ cmake -Bbuild-cuda-jureca -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON -DKokkos_ARCH_ZEN2=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF -GNinja
$ cmake --build build-cuda-jureca -j 8
Please get in touch if you run into troubles building on your own machine or cluster. This particularly applies to the latter given that there are always machine specific details (also for running under batch systems like slurm).
Additional information:
- https://github.com/parthenon-hpc-lab/athenapk?tab=readme-ov-file#building-athenapk
- https://parthenon-hpc-lab.github.io/parthenon/develop/src/building.html
If you know your ways around JSC systems, you can, of course, also directly connect to the systems and/or use other visualization tools like ParaView or Visit.
Note that all problem generators and solvers are always included in the binary and can be chosen at runtime through the input file. This might be different from other common codes (including Athena++) where a binary is specifically built for a certain setup.
On the terminal execute the simulation (we'll look at the input file kh.in
and output while the simulation runs in the background).
# Loading modules is only required once
# Skip (already done by the setup_env.sh script above)
# module load Stages/2024 NVHPC OpenMPI CMake HDF5 Python
# module load MPI-settings/CUDA
$ cd /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_uniform
$ srun -n 1 ../bin/athenaPK -i kh.in |& tee ath.out
Given that we run interactively, we pipe the output to ath.out
so that it's available later (for comparison).
Let's take the input file apart.
In general, input files are split into blocks indicated by <BLOCK_NAME>
where
BLOCK_NAME
may also contain/
to logically separate information.
The <job>
block is mandatory for AthenaPK as it defines which problem generator to load.
These are typically connected to their respective source files in athenapk/src/pgen
.
<job>
problem_id = kh
Each problem generator typically comes with additonal parameters.
In AthenaPK it is customary to collect these parameters in the <problem/PROBLEMNAME>
block, like
<problem/kh>
iprob = 5 # AMR test setup from Athena++ method paper
vflow = 1.0 # relative shear velocity of the interfaces
a = 0.01 # thickness of shearing layer
drat = 2 # density contrast/ratio
amp = 0.01 # amplitude of initial vy perturbation
sigma = 0.2 # thickness of perturbed layer
The overall mesh is configured within the <parthenon/mesh>
block.
Note the parthenon
prefix, which typically means that these inputs are controlled
by the Parthenon framework
and additional information might be located in the Parthenon documentation, e.g., https://parthenon-hpc-lab.github.io/parthenon/develop/src/inputs.html
Here, we setup a uniform, static 2D grid with 512x512 cells and periodic boundary conditions.
<parthenon/mesh>
refinement = none
nghost = 2
nx1 = 512
x1min = -0.5
x1max = 0.5
ix1_bc = periodic
ox1_bc = periodic
nx2 = 512
x2min = -0.5
x2max = 0.5
ix2_bc = periodic
ox2_bc = periodic
nx3 = 1
x3min = -0.5
x3max = 0.5
ix3_bc = periodic
ox3_bc = periodic
The mesh itself it logically separate into (identical) meshblocks whose size is configured in the <parthenon/meshblock>
block.
The size is fixed for all meshblocks and cannot be changed once the simulation has started.
Moreover, each active point in the simulation domain is covered by exactly one meshblock (this information is particularly relevant for simulations with mesh refinement).
All meshblocks communicate with each other via ghost/guard/halo cells (whose number is set in the <parthenon/mesh>
block above.
Here, we pick a block size of 128x512, i.e., the mesh is split into 4 blocks total. Picking the right block size is highly problem dependent and also connected to performance (see 101 on performance).
<parthenon/meshblock>
nx1=128
nx2=512
nx3=1
Parthenon also takes care of time integration, which is why the following block is also prefixed with parthenon
.
The following block will use a predictor-corrector Van-Leer type integrator with a CFL of 0.4.
The simulation will run till time 1.2 (in code time units) and not be limited by the number of cylces performed (negative nlim
).
<parthenon/time>
integrator = vl2
cfl = 0.4
tlim = 1.2
nlim = -1
The <hydro>
block is specific to AthenaPK again, see https://github.com/parthenon-hpc-lab/athenapk/blob/main/docs/input.md for additional information.
Here, we choose an adiabatic equation of state with ratio of specific heats of 1.4. Fluxes are calcuated using an HLLC Riemann solver with second-order, piecewise-linear reconstruction (in the primitive variables).
<hydro>
eos = adiabatic
riemann = hllc
reconstruction = plm
gamma = 1.4 # gamma = C_p/C_v
Finally, we define an output block (again controlled by Parthenon, see https://parthenon-hpc-lab.github.io/parthenon/develop/src/outputs.html for additional details.
Here, we pick the rst
type, which translates to restart
and allows to restart a simulation (and can also be read by standard postprocessing tools, see in the following).
Outputs are written
<parthenon/output0>
file_type = rst
dt = 0.1
The simulation should be done by now, so let's have a look at the output on the terminal.
A typical output of an AthenaPK simulation (or more generally Parthenon-based simulation) will start with general information available within the code framework.
Bonus info: Parthenon provides flexible interfaces to create variables that are described through metadata and afterwards handled transparently (e.g., communicated), see https://parthenon-hpc-lab.github.io/parthenon/develop/src/interface/metadata.html In this simulation so-called sparse variables (that are only allocated on a subset of blocks) or "swarms" (i.e., particles based variables/fields) are not used but cell-centered variables (and their fluxes).
Starting up hydro driver
# Variables in use:
# Package: parthenon::resolved_state
# ---------------------------------------------------
# Variables:
# Name Metadata flags
# ---------------------------------------------------
prim Cell,Provides,Real,Derived,Hydro,parthenon::resolved_state
cons Cell,Provides,Real,Independent,FillGhost,WithFluxes,Hydro,parthenon::resolved_state
bnd_flux::cons Face,Provides,Real,Derived,OneCopy,Flux,parthenon::resolved_state
# ---------------------------------------------------
# Sparse Variables:
# Name sparse id Metadata flags
# ---------------------------------------------------
# ---------------------------------------------------
# Swarms:
# Swarm Value metadata
# ---------------------------------------------------
Afterwards, the simulation actually starts.
For each cycle, the cycle number (cycle
), current time (time
), current timestep (dt
), performance in total cell updates per
wallclock second (across all computing resources, zone-cycles/wsec_step
), current simulation runtime in wallclock seconds (wsec_total
),
and wallclock seconds for the current cycle (wsec_step
) are printed.
Having performance numbers per cycles has turned out useful to gauge performance in simulations with varying load (like with adapative mesh refinement)
or to identify machine issues/variability.
Setup complete, executing driver...
cycle=0 time=0.0000000000000000e+00 dt=3.2952612821802183e-04 zone-cycles/wsec_step=0.00e+00 wsec_total=1.44e-01 wsec_step=2.40e+00
cycle=1 time=3.2952612821802183e-04 dt=3.2952539311580179e-04 zone-cycles/wsec_step=5.16e+06 wsec_total=1.94e-01 wsec_step=5.08e-02
cycle=2 time=6.5905152133382367e-04 dt=3.2952465716610244e-04 zone-cycles/wsec_step=2.55e+07 wsec_total=2.05e-01 wsec_step=1.03e-02
Finally, once the simulation is done an overview is printed.
Note that the total performance zone-cycles/wallsecond
might be different from the per-cycle output as it is an aggregate number.
cycle=3865 time=1.1994642549437093e+00 dt=2.7515679713439601e-04 zone-cycles/wsec_step=3.10e+07 wsec_total=3.29e+01 wsec_step=8.46e-03
cycle=3866 time=1.1997394117408438e+00 dt=2.6058825915620254e-04 zone-cycles/wsec_step=3.10e+07 wsec_total=3.29e+01 wsec_step=8.46e-03
cycle=3867 time=1.2000000000000000e+00 dt=2.7515150765570855e-04 zone-cycles/wsec_step=1.29e+07 wsec_total=3.29e+01 wsec_step=2.03e-02
Driver completed.
time=1.20e+00 cycle=3867
tlim=1.20e+00 nlim=-1
walltime used = 3.29e+01
zone-cycles/wallsecond = 3.08e+07
Various options exists to plot AthenaPK output (or more generally, Parthenon-based codes), see https://github.com/parthenon-hpc-lab/athenapk?tab=readme-ov-file#data-analysis
We'll use yt (https://yt-project.org/) with the frontend for Partheon-based codes provided above.
# adapted from https://yt-project.org/doc/cookbook/complex_plots.html#time-series-movie
ts = yt.load(f"/p/project1/training2437/{user}/tera_day1/athenapk_tutorial/kh_2d_uniform/parthenon.out0.*.rhdf")
plot = yt.SlicePlot(ts[0], "z", ("gas", "density"))
plot.set_log(("gas", "density"), False)
plot.set_zlim(("gas", "density"), 0.8, 2.1)
fig = plot.plots[("gas", "density")].figure
# animate must accept an integer frame number. We use the frame number
# to identify which dataset in the time series we want to load
def animate(i):
ds = ts[i]
plot._switch_ds(ds)
animation = FuncAnimation(fig, animate, frames=len(ts))
animation.save("kh_2d_uniform.mp4")
Video("kh_2d_uniform.mp4", embed=True)
NB: yt is a powerful data analysis tool.
Many (derived) fields are readily available and standard plot modifcations are straightforward.
ds = yt.load(f"/p/project1/training2437/{user}/tera_day1/athenapk_tutorial/kh_2d_uniform/parthenon.out0.final.rhdf")
slc = yt.SlicePlot(ds, "z", ("gas", "vorticity_z"))
slc.set_log(("gas", "vorticity_z"), False)
slc.annotate_grids()
slc.annotate_streamlines(("gas", "velocity_x"), ("gas", "velocity_y"))
# see ds.derived_field_list for a full list of derived fields
All input parameters can be modified from the command line and take precedence over the data in the input file.
The variable for a given block in an input file is combined to a single argument via an additional /
, e.g.,
<hydro>
reconstruction = ppm
turns into hydro/reconstruction=ppm
on the command line.
Run in a terminal
$ mkdir /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_uniform_ppm
$ cd /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_uniform_ppm
$ srun -n 1 ../bin/athenaPK -i ../kh_2d_uniform/kh.in hydro/reconstruction=ppm |& tee ath.out
Note that other reconstruction methods (like wenoz
or ppm
) require larger stencils and thus need additionan ghost zones (3 for those two methods).
Can you fix this (on the command line) so that the simulation runs?
Click here for solution
srun -n 1 ../bin/athenaPK -i ../kh_2d_uniform/kh.in hydro/reconstruction=ppm parthenon/mesh/nghost=3 |& tee ath.out# adapted from https://yt-project.org/doc/cookbook/complex_plots.html#time-series-movie
ts = yt.load(f"/p/project1/training2437/{user}/tera_day1/athenapk_tutorial/kh_2d_uniform_ppm/parthenon.out0.*.rhdf")
plot = yt.SlicePlot(ts[0], "z", ("gas", "density"))
plot.set_log(("gas", "density"), False)
plot.set_zlim(("gas", "density"), 0.8, 2.1)
fig = plot.plots[("gas", "density")].figure
# animate must accept an integer frame number. We use the frame number
# to identify which dataset in the time series we want to load
def animate(i):
ds = ts[i]
plot._switch_ds(ds)
animation = FuncAnimation(fig, animate, frames=len(ts))
animation.save("kh_2d_uniform_ppm.mp4")
Video("kh_2d_uniform_ppm.mp4", embed=True)
Again lets first start the simulation and then have a look at the input file while the simulation runs.
$ cd /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_amr
$ srun -n 1 ../bin/athenaPK -i kh.in |& tee ath.out
What has changed compared to the input file for the uniform grid?
!git diff --no-index --word-diff --color-words /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_uniform/kh.in /p/project1/training2437/$USER/tera_day1/athenapk_tutorial/kh_2d_amr/kh.in
[1mdiff --git a/p/project1/training2437/grete1/tera_day1/athenapk_tutorial/kh_2d_uniform/kh.in b/p/project1/training2437/grete1/tera_day1/athenapk_tutorial/kh_2d_amr/kh.in[m
[1mindex bb2ed27..47450dc 100644[m
[1m--- a/p/project1/training2437/grete1/tera_day1/athenapk_tutorial/kh_2d_uniform/kh.in[m
[1m+++ b/p/project1/training2437/grete1/tera_day1/athenapk_tutorial/kh_2d_amr/kh.in[m
[36m@@ -17,17 +17,23 @@[m [mdrat = 2 # density contrast/ratio[m
amp = 0.01 # amplitude of initial vy perturbation[m
sigma = 0.2 # thickness of perturbed layer[m
[32m<refinement>[m
[32mtype = xyvelocity_gradient [m
[32mthreshold_xyvelocity_gradient = 0.01[m
<parthenon/mesh>[m
refinement = [31mnone[m[32madaptive[m
[32mnumlevel = 3 # total number of levels[m
nghost = 2[m
nx1 = [31m512[m[32m128[m
x1min = -0.5[m
x1max = 0.5[m
ix1_bc = periodic[m
ox1_bc = periodic[m
nx2 = [31m512[m[32m128[m
x2min = -0.5[m
x2max = 0.5[m
ix2_bc = periodic[m
[36m@@ -40,8 +46,8 @@[m [mix3_bc = periodic[m
ox3_bc = periodic[m
<parthenon/meshblock>[m
[31mnx1=128[m
[31mnx2=512[m[32mnx1=16[m
[32mnx2=16[m
nx3=1[m
<parthenon/time>[m
[36m@@ -49,6 +55,7 @@[m [mintegrator = vl2[m
cfl = 0.4[m
tlim = 1.2[m
nlim = -1[m
[32mncycle_out_mesh = -1000 # write out mesh structure every 1000 cycles (and on changes)[m
<hydro>[m
eos = adiabatic[m
In other words:
Click
Adaptive mesh refinement was enabled limiting the hierarchy to three levels total.Mesh refinement in Parthenon based codes always happens with refinment factors of two between the levels.
Moreover, the simulation was run at the same effective resolution (the root grid has only 1/4 of the number of cells (512->128) but two additional levels of refinement are allowed.
Adapative mesh refinement is triggered given a threshold in the velocity gradient (this choice here is a little arbitrary and just set to match the setup in the Athena++ code paper).
In addition, the meshblock size was adjusted (reduced) to allow for localized refinement -- otherwise, the entire domain would just refine).
The output differs slightly:
cycle=0 time=0.0000000000000000e+00 dt=3.2952734694010211e-04 zone-cycles/wsec_step=0.00e+00 wsec_total=8.76e-02 wsec_step=2.98e+00 zone-cycles/wsec=0.00e+00 wsec_AMR=0.00e+00
---------------------- Current Mesh structure ----------------------
Number of Trees = 64
Total number of MeshBlocks = 352
Number of physical refinement levels = 2
Number of logical refinement levels = 2
Physical level = 0 (logical level = 0): 32 MeshBlocks, cost = 32
Physical level = 1 (logical level = 1): 64 MeshBlocks, cost = 64
Physical level = 2 (logical level = 2): 256 MeshBlocks, cost = 256
--------------------------------------------------------------------
cycle=1 time=3.2952734694010211e-04 dt=3.2952683932585395e-04 zone-cycles/wsec_step=3.27e+05 wsec_total=3.63e-01 wsec_step=2.75e-01 zone-cycles/wsec=3.27e+05 wsec_AMR=5.67e-06
First, the mesh structure is being shown (also every time it changes) and additional performance counters are given:
-
zone-cycles/wsec
differs fromzone-cycles/wsec_step
by including time for the AMR machinery (like checking for refinement) -
wsec_AMR
for the wallclock time spent in remeshing
# adapted from https://yt-project.org/doc/cookbook/complex_plots.html#time-series-movie
ts = yt.load(f"/p/project1/training2437/{user}/tera_day1/athenapk_tutorial/kh_2d_amr/parthenon.out0.*.rhdf")
plot = yt.SlicePlot(ts[0], "z", ("gas", "density"))
plot.set_log(("gas", "density"), False)
plot.set_zlim(("gas", "density"), 0.8, 2.1)
plot.annotate_grids()
fig = plot.plots[("gas", "density")].figure
# animate must accept an integer frame number. We use the frame number
# to identify which dataset in the time series we want to load
def animate(i):
ds = ts[i]
plot._switch_ds(ds)
animation = FuncAnimation(fig, animate, frames=len(ts))
animation.save("kh_2d_amr.mp4")
Video("kh_2d_amr.mp4", embed=True)
Bonus: Is there anything you noted about this simulation?
Click
Looks practically identical to the uniform grid one.But performance was not worse than unigrid (we'll cover this later).
On the up side: the snapshots/outputs got smaller (for effectively the same result).
The final snapshot only shows two levels of refinement, i.e., the very large blocks are gone meaning that the entire root grid is refinement (and thus the refinement strategy chosen/block size is not ideal).