You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a comparison point for testing how this affects performance for an apples-to-apples test, build Exec/science/wdmerger with make USE_CUDA=TRUE USE_MPI=TRUE DIM=3 TINY_PROFILE=TRUE. This is the test case I ran on Perlmutter across 4 GPUs:
(The interactive job was allocated with salloc -N 1 --gpus-per-task=1 --gpu-bind=map_gpu:0,1,2,3 --tasks-per-node=4 -t 120 --qos=interactive -A m3018_g -C gpu).
This is what the profile looks like on the last timestep.
The gravity solve takes 79 ms, of which 24 ms is spent in the multipole BC and 55 ms is spent in the Poisson solve.
This is not the only relevant configuration to consider; at larger scale, gravity tends to dominate over hydro and the profile looks a bit different, making the BCs less important. But it's a useful starting point for analysis.
AMReX now has a James BC solver:
AMReX-Codes/amrex#2912
we should add the ability to use this instead of the multipole solver for BCs.
The text was updated successfully, but these errors were encountered: