Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evp kernelv2 #278

Merged
merged 20 commits into from
Feb 2, 2019
Merged

Evp kernelv2 #278

merged 20 commits into from
Feb 2, 2019

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Feb 1, 2019

This is a clean branch associated with #252 from @mhrib. The code is identical to #252. #252 PR is copied here:

#205 updated to present master.

  • Added "evp_1d"+"evp_2d" timer
  • Removed "strocnx,strocny" from stepu subroutine in dyn_evp_shared

New vectorized EVP kernel:

  • vectorized - contiguous vectors.
  • Running OpenMP on the GLOBAL net on one single node ie. reduce communication and avoiding HALO updates to every single block every iteration.
  • Able to run CICE mixed MPI+OpenMP on several nodes. Use Node "my_task==master_task" for EVP iterations:
  • Works for "regular" circular grids - eg. GX3. Ie. do a "circular" halo update.
  • Not tested for tri-pole grids. Doubt that it works. A stop is introduced in the code.
  • Namelist: Shift between "CICE standard" EVP and this EVP via. namelist. Default is evp_kernel_ver=0 , which is "CICE standard" version
  • Use environment "OMP_NUM_THREADS" to select number of threads used by the evp_kernel.
  • There is a natural "overhead" associated with gather/scatter and defining vectors. Therefore it becomes more efficient for "large" setups.

Possible code correction/reduction??: affects "stepu" routines in: evp_kernel1d.F90 AND default code ice_dyn_shared.F90:

  • Is "strocnx,strocny" used at all from subroutine "stepu"? Is it needed here? It is also calculated in subroutine "dyn_finish" short after the EVP-loop. Therefore I have removed the calculation in this EVP_kernel, but it can easily be re-introduced (using slightly more memory) by searching for: "!strocn"

--o--

Namelist:
&dynamics_nml
evp_kernel_ver = 0 ! 0: CICE (default) , 2: kernel_v2

Environment:
OMP_NUM_THREADS

Option: REAL4 internally instead of REAL8:

  • Not implemented as a namelist switch, but can be tried out simply by this "poor-man-option":
    mv evp_kernel1d.F90 evp_kernel1d_r8.F90
    cat evp_kernel1d_r8.F90 | sed s/DBL_KIND/REAL_KIND/g > evp_kernel1d.F90

--o--

  • Developer(s):
    Mads Hvid Ribergaard and Jacob Weismann Poulsen, DMI

  • Please suggest code Pull Request reviewers in the column at right.
    @eclare108213

  • Are the code changes bit for bit, different at roundoff level, or more substantial?
    In most cases it should be "bit-to-bit". Depending on how the code is translated during compilation and which math libs are used.

  • Does this PR create or have dependencies on Icepack or any other models?
    Nothing other than CICE normally have

  • Is the documentation being updated with this PR?
    A few lines added in developers guide: doc/source/developer_guide/dg_dynamics.rst

  • If not, does the documentation need to be updated separately at a later time?

  • Other Relevant Details:

Basic structure
evp_copyin() : gather
evp_kernel() : loop stress/stepu/halo_update
evp_copyout() : scatter

There is a natural "overhead" associated with gather/scatter and defining vectors. Therefore it becomes more efficient for "large" setups.

  • Affected files:
    ./cicecore/cicedynB/dynamics/evp_kernel1d.F90 (The core, new file)
    ./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 (switch between CICE/new kernels)
    ./cicecore/cicedynB/dynamics/ice_dyn_shared.F90 (namelist)
    ./cicecore/cicedynB/general/ice_init.F90 (namelist)
    ./cicecore/cicedynB/infrastructure/comm/mpi/ice_gather_scatter.F90 ()
    ./cicecore/cicedynB/infrastructure/comm/serial/ice_gather_scatter.F90 ()
    (*) Updated gather_scatter_ext to take care of integers + logicals (icetmask,iceumask)

  • Possible OMP issues in these files (see more below):
    ./cicecore/cicedynB/analysis/ice_diagnostics.F90 (2x)
    ./cicecore/cicedynB/general/ice_init.F90
    ./cicecore/drivers/cice/CICE_RunMod.F90
    Update since Fast vectorized EVP kernel #205 and maybe solved (to be tested):
    ./cicecore/cicedynB/analysis/ice_history.F90
    ./cicecore/cicedynB/dynamics/ice_transport_driver.F90 (2x)
    ./cicecore/cicedynB/dynamics/ice_transport_remap.F90
    Update since Fast vectorized EVP kernel #205: OMP -> TCXOMP
    ./cicecore/cicedynB/dynamics/ice_dyn_eap.F90 - TCXOMP
    ./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 - TCXOMP

  • OpenMP Issues ??:
    Maybe OpenMP raises?. I have not tested it carefully, but a previous CICE version shows some raises. I have added a comment line just before the OMPs, that I did comment out last time. But all OMP's stays un-commented in this PR.
    Search for "!MHRI: CHECK THIS OMP"

NOTE: I did only un-comment the OMPs to check its runs smoothly. Ie. this only means, that there is possible a thread-issue in one of the files - not necessary all of them.

@apcraig
Copy link
Contributor Author

apcraig commented Feb 1, 2019

Test results associated with #252 can be found here,
https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks
#aa6de33 is the baseline results.
#aa6de33...+evpk2 are results with evp_kernel_ver=2

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @apcraig's tests and description of where this is, I'm willing to approve this for merging. Please create issues to remind us that there is still debugging to do and that the 1D approach needs to be documented. Minor comments:

Are we expecting to implement evp_kernel_ver=1? Is it needed? (This has no bearing on the current PR, I'm just curious.)

I guess I'd prefer the shorter 'evp_kernel' instead of evp_kernel_ver (I don't think 'ver' adds anything to understanding what the flag is about), but I won't insist on changing it. In general, a 'k' at the beginning of a flag means "what kind of", e.g. kdyn is the kind of dynamics, so in this case I might have used kevp_kernel.

@apcraig apcraig merged commit d2cfd6d into CICE-Consortium:master Feb 2, 2019
@apcraig apcraig mentioned this pull request Feb 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants