Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialize the distributions on the GPU #498

Merged
merged 10 commits into from
Jan 5, 2024

Conversation

atmyers
Copy link
Member

@atmyers atmyers commented Jan 4, 2024

Parallelize particle initialization using ParallelForRNG for all backends.

Follow-up to #73
Close #495

  • ensure GPU results are correct

@ax3l ax3l added backend: cuda Specific to CUDA execution (GPUs) backend: dpc++ Specific to DPC++/SYCL execution (CPUs/GPUs) backend: hip Specific to ROCm execution (GPUs) backend: openmp Specific to OpenMP execution (CPUs) Performance optimization labels Jan 4, 2024
@ax3l ax3l self-requested a review January 4, 2024 19:19
@ax3l ax3l self-assigned this Jan 4, 2024
src/initialization/InitDistribution.H Outdated Show resolved Hide resolved
src/initialization/InitDistribution.H Outdated Show resolved Hide resolved
src/initialization/InitDistribution.cpp Outdated Show resolved Hide resolved
@ax3l ax3l force-pushed the gpu_init_distribution branch 3 times, most recently from 5111901 to d1967fe Compare January 4, 2024 19:31
{
BL_PROFILE("ImpactX::AddNParticles");

AMREX_ALWAYS_ASSERT_WITH_MESSAGE(lev == 0, "AddNParticles: only lev=0 is supported yet.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImpactX support MR now. I wonder if lev!=0 is generally needed or if we want to remove the parameter altogether.

I assume that on Redistribute we automatically move the particles in the corresponding level at the end of AddNParticles?

Fix the missing resize before redistribute.
Remove the old, unusued, serial initialization logic.
src/initialization/InitDistribution.H Dismissed Show resolved Hide resolved
@ax3l
Copy link
Member

ax3l commented Jan 4, 2024

Ok, this passed the thermal tests on GPU on Perlmutter 🎉

Remove stack memory usage and use GPU memory for pre-computed CDF.
@ax3l
Copy link
Member

ax3l commented Jan 4, 2024

@cemitch99 a couple of tests fail with this PR on GPU due to tolerances in analysis scripts. The main reason is that we change the way we draw random numbers and have different seeds in parallel execution now.

Unless there is a GPU bug, might need to relax tolerances a bit, I think.

@ax3l
Copy link
Member

ax3l commented Jan 5, 2024

The following tests FAILED:
	  2 - FODO.analysis (Failed)
	  5 - FODO.MPI.analysis (Failed)
	  8 - FODO.py.analysis (Failed)
	 11 - FODO.MADX.py.analysis (Failed)
	 14 - FODO.py.MPI.analysis (Failed)
	 17 - chicane.analysis (Failed)
	 20 - chicane.py.analysis (Failed)
	 23 - chicane.MADX.py.analysis (Failed)
	 34 - gaussian.analysis (Failed)
	 38 - FODO_RF.analysis (Failed)
	 40 - FODO_RF.py.analysis (Failed)
	 42 - kurth4d.analysis (Failed)
	 44 - semigaussian.analysis (Failed)
	 46 - multipole.analysis (Failed)
	 48 - expanding_beam.analysis (Failed)
	 50 - expanding_beam.py.analysis (Failed)
	 52 - multipole.py.analysis (Failed)
	 58 - iotalattice.MPI.analysis (Failed)
	 60 - iotalattice.py.MPI.analysis (Failed)
	 62 - kurth_periodic.analysis (Failed)
	 64 - kurth_periodic.py.analysis (Failed)
	 74 - solenoid.analysis (Failed)
	 76 - solenoid.py.analysis (Failed)
	 78 - solenoid.MADX.py.analysis (Failed)
	 88 - quadrupole_softedge.analysis (Failed)
	 90 - quadrupole_softedge.py.analysis (Failed)
	 92 - fodo_chromatic.analysis (Failed)
	 94 - fodo_chromatic.py.analysis (Failed)
	102 - cyclotron.py.analysis (Failed)
	110 - compression.py.analysis (Failed)
	136 - IOTA_lattice.analysis (Failed)
	138 - IOTA_lattice.py.analysis (Failed)
	143 - aperture.run (Failed)
	144 - aperture.analysis (Failed)
	145 - aperture.py.run (Failed)
	146 - aperture.py.analysis (Failed)
	148 - apochromat.analysis (Failed)
	152 - alignment.analysis (Failed)

@ax3l
Copy link
Member

ax3l commented Jan 5, 2024

For the FODO cell example, the initial beam sigma_x,y and emittance_x,y is 2 orders of magnitude smaller then the reference value.
That makes no real sense to me, since we set finite numbers. Could be a GPU bug.

Update: misread, indeed just sampling noise. Will slightly push tolerances.

Avoid promoting to double in C functions.
On older NVCC
Run on local GPU (RTX A2000).
@ax3l ax3l enabled auto-merge (squash) January 5, 2024 04:34
@ax3l ax3l mentioned this pull request Jan 5, 2024
@ax3l ax3l merged commit 71df3bd into ECP-WarpX:development Jan 5, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: cuda Specific to CUDA execution (GPUs) backend: dpc++ Specific to DPC++/SYCL execution (CPUs/GPUs) backend: hip Specific to ROCm execution (GPUs) backend: openmp Specific to OpenMP execution (CPUs) Performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize Particle Init
2 participants