-
Notifications
You must be signed in to change notification settings - Fork 101
Chroma ECP
Balint Joo edited this page Oct 24, 2022
·
2 revisions
The Chroma ECP HMC benchmark concerns running a 2+1 flavor Stout-improved clover fermion simulation. All solves are offloaded from Chroma and run in QUDA, with everything else run on GPUs using qdpjit.
- The two-flavor determinant contribution is preconditioned using three levels of Hasenbusch mass preconditioning
- The one-flavor determinant contribution is evaluated using RHMC
- A two-level time integration is used, with a fourth-order force-gradient integrator deployed
*
. The pure-gauge contribution and heaviest two-flavor fermionic contributions are on the fine timescale, with all fermionic contributions on the coarse timescale.
*
The original Titan-baseline and Summit-baseline results used a minimum-norm Omelyan second-order integrator.
- The two-flavor solves all utilize QUDA's adaptive multigrid algorithm
**
, where the null-space is computed using the light mass and applied to all heavier solves. The outer solver is single-precision GCR, with double-precision defect correction employed. The multigrid preconditioner is mostly run in half precision, with strategic use of fixed-point int32 precision to ensure determinism. On architectures that support it, tensor-core acceleration is applied in the multigrid setup phase. - The one-flavor solve utilizes a mixed-precision multi-shift CG algorithm, where the multi-shift solver is run in double-single precision, with per-shift refinement applied in double-half precision.
**
The original Titan-baseline and Summit-baseline results used an additive Schwarz preconditioner instead of adaptive multigrid.
Machine | algorithm | GPU | #GPU | Time (s) |
---|---|---|---|---|
Titan | baseline | NVIDIA Tesla K20X | 1024 | 4006 |
Titan | MG + FG | NVIDIA Tesla K20X | 512 | 974 |
Summit | baseline | NVIDIA Tesla V100 | 128 | 1878 |
Summit | MG + FG | NVIDIA Tesla V100 | 128 | 329 |
Juelich booster | MG + FG | NVIDIA A100 SXM | 64 | 285 |
Juelich booster | MG + FG | NVIDIA A100 SXM | 128 | 166 |
Selene | MG + FG | NVIDIA A100 SXM | 64 | 241 |
Selene | MG + FG | NVIDIA A100 SXM | 128 | 150 |
Spock | MG + FG | AMD MI100 | 64 | 973 |
Spock | MG + FG | AMD MI100 | 128 | 640 |
Borg | MG + FG | AMD MI250 | 64 (128x GCD) | 386 |
Credits
- Chroma-QUDA multigrid HMC developed jointly by Kate Clark (NVIDIA) and Bálint Joó (ORNL)
- Titan, Summit, Spock and Borg results: Bálint Joó (ORNL)
- Juelich Booster and Selene results: Mathias Wagner (NVIDIA)
- qdpjit: Frank Winter (Jlab)
- Chroma: Robert Edwards (Jlab) and Bálint Joó (ORNL)
- Chroma's force-gradient integrator implemented by Boram Yoon (Los Alamos)
Spock and Borg results computed from speedup numbers here relative to Titan baseline, accounting for reduction in numbers of GPUs. For example, the Borg number is computed as (4006/166)*(1024/64) = 386 seconds.