initialize the distributions on the GPU #498

atmyers · 2024-01-04T18:37:31Z

Parallelize particle initialization using ParallelForRNG for all backends.

Follow-up to #73
Close #495

ensure GPU results are correct

src/initialization/InitDistribution.H

src/initialization/InitDistribution.cpp

src/initialization/InitDistribution.H

ax3l · 2024-01-04T19:48:40Z

src/particles/ImpactXParticleContainer.cpp

+    {
+        BL_PROFILE("ImpactX::AddNParticles");
+
+        AMREX_ALWAYS_ASSERT_WITH_MESSAGE(lev == 0, "AddNParticles: only lev=0 is supported yet.");


ImpactX support MR now. I wonder if lev!=0 is generally needed or if we want to remove the parameter altogether.

I assume that on Redistribute we automatically move the particles in the corresponding level at the end of AddNParticles?

src/particles/ImpactXParticleContainer.cpp

Fix the missing resize before redistribute. Remove the old, unusued, serial initialization logic.

src/initialization/InitDistribution.H

ax3l · 2024-01-04T23:43:30Z

Ok, this passed the thermal tests on GPU on Perlmutter 🎉

Remove stack memory usage and use GPU memory for pre-computed CDF.

ax3l · 2024-01-04T23:51:51Z

@cemitch99 a couple of tests fail with this PR on GPU due to tolerances in analysis scripts. The main reason is that we change the way we draw random numbers and have different seeds in parallel execution now.

Unless there is a GPU bug, might need to relax tolerances a bit, I think.

ax3l · 2024-01-05T00:53:58Z

The following tests FAILED:
	  2 - FODO.analysis (Failed)
	  5 - FODO.MPI.analysis (Failed)
	  8 - FODO.py.analysis (Failed)
	 11 - FODO.MADX.py.analysis (Failed)
	 14 - FODO.py.MPI.analysis (Failed)
	 17 - chicane.analysis (Failed)
	 20 - chicane.py.analysis (Failed)
	 23 - chicane.MADX.py.analysis (Failed)
	 34 - gaussian.analysis (Failed)
	 38 - FODO_RF.analysis (Failed)
	 40 - FODO_RF.py.analysis (Failed)
	 42 - kurth4d.analysis (Failed)
	 44 - semigaussian.analysis (Failed)
	 46 - multipole.analysis (Failed)
	 48 - expanding_beam.analysis (Failed)
	 50 - expanding_beam.py.analysis (Failed)
	 52 - multipole.py.analysis (Failed)
	 58 - iotalattice.MPI.analysis (Failed)
	 60 - iotalattice.py.MPI.analysis (Failed)
	 62 - kurth_periodic.analysis (Failed)
	 64 - kurth_periodic.py.analysis (Failed)
	 74 - solenoid.analysis (Failed)
	 76 - solenoid.py.analysis (Failed)
	 78 - solenoid.MADX.py.analysis (Failed)
	 88 - quadrupole_softedge.analysis (Failed)
	 90 - quadrupole_softedge.py.analysis (Failed)
	 92 - fodo_chromatic.analysis (Failed)
	 94 - fodo_chromatic.py.analysis (Failed)
	102 - cyclotron.py.analysis (Failed)
	110 - compression.py.analysis (Failed)
	136 - IOTA_lattice.analysis (Failed)
	138 - IOTA_lattice.py.analysis (Failed)
	143 - aperture.run (Failed)
	144 - aperture.analysis (Failed)
	145 - aperture.py.run (Failed)
	146 - aperture.py.analysis (Failed)
	148 - apochromat.analysis (Failed)
	152 - alignment.analysis (Failed)

ax3l · 2024-01-05T01:38:10Z

For the FODO cell example, the initial beam sigma_x,y and emittance_x,y is 2 orders of magnitude smaller then the reference value.
That makes no real sense to me, since we set finite numbers. Could be a GPU bug.

Update: misread, indeed just sampling noise. Will slightly push tolerances.

src/particles/ImpactXParticleContainer.cpp

Avoid promoting to double in C functions.

On older NVCC

Run on local GPU (RTX A2000).

ax3l added backend: cuda Specific to CUDA execution (GPUs) backend: dpc++ Specific to DPC++/SYCL execution (CPUs/GPUs) backend: hip Specific to ROCm execution (GPUs) backend: openmp Specific to OpenMP execution (CPUs) Performance optimization labels Jan 4, 2024

ax3l self-requested a review January 4, 2024 19:19

ax3l self-assigned this Jan 4, 2024

ax3l reviewed Jan 4, 2024

View reviewed changes

src/initialization/InitDistribution.H Outdated Show resolved Hide resolved

src/initialization/InitDistribution.H Outdated Show resolved Hide resolved

src/initialization/InitDistribution.cpp Outdated Show resolved Hide resolved

ax3l force-pushed the gpu_init_distribution branch 3 times, most recently from 5111901 to d1967fe Compare January 4, 2024 19:31

atmyers and others added 2 commits January 4, 2024 11:33

initialize the distributions on the GPU

1e28318

Fix Thermal: const operator

55e4c43

ax3l force-pushed the gpu_init_distribution branch from d1967fe to 55e4c43 Compare January 4, 2024 19:33

ax3l reviewed Jan 4, 2024

View reviewed changes

src/initialization/InitDistribution.H Outdated Show resolved Hide resolved

Python Support, Cleaning

286459d

ax3l force-pushed the gpu_init_distribution branch from 29c1881 to 286459d Compare January 4, 2024 19:45

ax3l reviewed Jan 4, 2024

View reviewed changes

src/particles/ImpactXParticleContainer.cpp Outdated Show resolved Hide resolved

Cleanup: Unused, Resize

48e7d4f

Fix the missing resize before redistribute. Remove the old, unusued, serial initialization logic.

github-advanced-security bot found potential problems Jan 4, 2024

View reviewed changes

src/initialization/InitDistribution.H Dismissed Show resolved Hide resolved

Thermal Distribution: GPU Memory

d7a2652

Remove stack memory usage and use GPU memory for pre-computed CDF.

ax3l force-pushed the gpu_init_distribution branch from c03bfc1 to d7a2652 Compare January 4, 2024 23:44

ax3l requested a review from cemitch99 January 4, 2024 23:44

ax3l reviewed Jan 5, 2024

View reviewed changes

src/particles/ImpactXParticleContainer.cpp Show resolved Hide resolved

Cosmetic Cleaning

86d6b6d

ax3l added 4 commits January 4, 2024 19:24

Waterbag: Clean C++ Math

322ffd6

Avoid promoting to double in C functions.

Cosmetic Cleaning for Clarity

380731d

Fix CUDA Compile

246f11a

On older NVCC

Analysis: Slightly Increase Sampling Tolerance

25ce7a5

Run on local GPU (RTX A2000).

ax3l approved these changes Jan 5, 2024

View reviewed changes

ax3l enabled auto-merge (squash) January 5, 2024 04:34

ax3l mentioned this pull request Jan 5, 2024

Parallelize Particle Init #495

Closed

ax3l merged commit 71df3bd into ECP-WarpX:development Jan 5, 2024
15 checks passed

ax3l mentioned this pull request Jan 10, 2024

MPI Domain-Decomposition w/o Space Charge #503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initialize the distributions on the GPU #498

initialize the distributions on the GPU #498

atmyers commented Jan 4, 2024 •

edited by ax3l

Loading

ax3l Jan 4, 2024

ax3l commented Jan 4, 2024

ax3l commented Jan 4, 2024

ax3l commented Jan 5, 2024

ax3l commented Jan 5, 2024 •

edited

Loading

initialize the distributions on the GPU #498

initialize the distributions on the GPU #498

Conversation

atmyers commented Jan 4, 2024 • edited by ax3l Loading

ax3l Jan 4, 2024

Choose a reason for hiding this comment

ax3l commented Jan 4, 2024

ax3l commented Jan 4, 2024

ax3l commented Jan 5, 2024

ax3l commented Jan 5, 2024 • edited Loading

atmyers commented Jan 4, 2024 •

edited by ax3l

Loading

ax3l commented Jan 5, 2024 •

edited

Loading