-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seabed stress - remove if statements #673
Conversation
…r taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course this is faster! Thanks for refactoring it.
I think it would be good if we could get a sense of which tests are faster, and which are slower, and by what kind of amount. I'm guessing OpenMP cases will be faster. |
Yes it is faster on some of the openmpi test. This is primarily the change in stepu as this has been run as a "standard test". The result is that the change in the seabed stress is only called in 2 test with intel and 2 test with gnu. I have uploaded the results.log |
It looks like cases with alt01, alt03, gx1prod, and seabed all have seabed_stress on. Looking at @TillRasmussen's results, I'm not sure anything conclusive can be said about performance changes. If anything, this change looks a little slower in general. Are these change being done to improve performance? If so, I think we need more evidence that it's useful. If these changes are to improve code readability, flexibility, or some other reason, then that's fine too. Lets make sure we know why we're making the change and agree it makes sense. |
It's hard to tell -- could be a toss-up. A lot of the differences, both + and -, are just in the noise of machine load from one run to the next. Running the entire suite again (including the baseline) could help narrow that uncertainty down. There are a few runs, though, that are more than 20% different and seem to me to be mostly indicating that the new version is faster, but maybe not. These would show up in @apcraig 's green/yellow/red timing colors on the testing page. @TillRasmussen can you upload the test results? A notable exception - what's going on here?
Let's make sure that I'm interpreting the output correctly... the top line here shows the new version timings, and the last line shows the baseline timings, for the timeloop, dynamics and column timers? |
Performance was the motivation. I think that theoretically it should be faster as If statements within do loops should be avoided, especially when they do not depend on i,j etc. There are still if statements within the seabed_stress functions. The variability can be the DMI HPC system, we see op to 50 % differences in the run time in the operational setup primarily due to high load on the hard disk (90%+) and wrong usage of a system like that. If you see the example that @eclare108213 then it is not the dynamics nor the thermodynamics that makes the difference |
For the reference I reran the example that Elizabeth extracted with the following result |
…nt. Moved call to deformation out of loop
A few questions remains: |
Note OpenMP is turned off in the evp subcycling |
…b in stepu, remove if from seabed_stress_LKD
Is this one ready to be merged? Still waiting for reviews from @JFLemieux73 @phil-blain ? |
I don't think it's ready for merging, I'd like to take a closer look and maybe @JFLemieux73 would too. It's not clear to me either if it's still bit for bit ? was the |
At the time of testing (December) it was bit for bit with the main trunk. @apcraig has created a pull request for OMP. Maybe this should be accepted then I can integrate these changes aftwerwards and resubmit the test. |
real (kind=dbl_kind), dimension (nx_block,ny_block), intent(in) :: & | ||
Cw ! ocean-ice neutral drag coefficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work @TillRasmussen, I think we are going in the right direction with these changes. I left a few comments.
Also, thanks for crafting separate commits that are easy to review separately, this is very much apprciated.
Commit 3/4 (changed capping from logical to numeric...
) could still be splitted even more I think, as it seems to do several distinct things:
- remove
ksub
fromstepu
's interface and adjust callers - remove
ksub
,tarear
,shear
,divu
,rdg_conv
andrdg_shear
fromstress
's interface and adjust callers, and moving the call todeformations
on frame up. - decrease the
do ksub
loop ending index by 1 inevp
and add additional separate calls tostress
,stepu
, etc after the loop - change
capping
to real inviscous_coeffs_and_rep_pressure
andcalc_zeta_dPr
! calculate seabed stress component for outputs | ||
! only needed on last iteration. | ||
taubx(i,j) = -uvel(i,j)*Cb | ||
tauby(i,j) = -vvel(i,j)*Cb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's faster to compute them each time than having an if
here, then this indeed makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If statements should in general be avoided and the extra calculations are limited.. Timings are not conclusive. I would like to close this issue as it is above. Next step would be to vectorize uvel_init,, vvel_init and Tbu. These are alle "local" to ice_dyn_shared and only used in stepu. With these changes I would like to see if it makes sense to convert cb to a vector and calculate tauby and taubx after the last call to stepu just as well as deformations. An addiotonal benefit is that this align 1d solver and the 2d solver.
! if (hwu < threshold_hw) then | ||
docalc_tbu = max(sign(c1,threshold_hw-hwu),c0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this would be easier to read and understand with the merge
intrinsic (Intel doc, GFortran doc), which is like a ternary operator:
docalc_tbu = merge(c1, c0, hwu < threshold_hw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Will be in a commit a bit later.
strintx , & ! divergence of internal ice stress, x (N/m^2) | ||
strinty , & ! divergence of internal ice stress, y (N/m^2) | ||
strairx , & ! stress on ice by air, x-direction | ||
strairy ! stress on ice by air, y-direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about those deletions. The computations for Hibler-Bryan stresses are still commented in the subroutine, so if we ever add them back and test them, we would have to add these four arguments back in.
So either we also remove the commented code, or we keep as-is. I do not think that having extra arguments impact performance, as Fortran uses pass-by-reference for arrays, So it should have no impact to have extra arguments passed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that they should be removed or reimplemented, however I dont have an opininion on which is the right to do, however the Hibler-Bryan stresses has been commented out as long as I remember. I would remove the commented code. @eclare108213 any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave them there, but commented out.
taubx(i,j) = -uvel(i,j)*Cb | ||
tauby(i,j) = -vvel(i,j)*Cb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I thought we had made this change already somewhere...
@@ -1401,42 +1399,33 @@ subroutine viscous_coeffs_and_rep_pressure (strength, tinyarea, & | |||
|
|||
! NOTE: for comp. efficiency 2 x zeta and 2 x eta are used in the code | |||
|
|||
! if (trim(yield_curve) == 'ellipse') then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we keep this commented if
in, @JFLemieux73 plans to eventually add more yield curves, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reinserted it
!----------------------------------------------------------------- | ||
! on last subcycle, save quantities for mechanical redistribution | ||
!----------------------------------------------------------------- | ||
call deformations (nx_block , ny_block , & | ||
icellt(iblk) , & | ||
indxti(:,iblk) , indxtj(:,iblk) , & | ||
uvel(:,:,iblk) , vvel(:,:,iblk) , & | ||
dxt(:,:,iblk) , dyt(:,:,iblk) , & | ||
cxp(:,:,iblk) , cyp(:,:,iblk) , & | ||
cxm(:,:,iblk) , cym(:,:,iblk) , & | ||
tarear(:,:,iblk) , & | ||
shear(:,:,iblk) , divu(:,:,iblk) , & | ||
rdg_conv(:,:,iblk) , rdg_shear(:,:,iblk) ) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's way cleaner to compute the deformations here, good idea. However, can we keep the &
and the commas aligned ?
if (ndte > 1) then | ||
do ksub = 1,ndte-1 ! subcycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this change is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ndte=1 then some compilers will create warnings and maybe even stop due to a do loop that runs from 1 to 0. The same should be included in the 1d .Now it aborts if ndte < 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be true that the code will stop? Compiler warnings maybe, but setting loop values so a loop won't be executed is done all the time and should be allowed in general. I'm not opposed to the if check, it's not going to affect performance, but I think it's cleaner not being there. Where is the check for ndte<1. It's not in this part of the code, so the if check is needed even less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a strong motivator is to get rid of if checks in loops. I assume this is for performance. My question continues to be whether we have measured a performance benefit with these changes. If not, then I prefer that we not replicate the calls for the last subcycle step in order to add the deformations call outside an if statement. My suspicion is that the if statement in the subcycling loop is not going to affect performance. Getting rid of if statements inside an ij loop, ok maybe, but not at this high a level in the computations. In my opinion, it's easier to maintain if the last iteration is not separated/duplicated. Maybe there are other reasons for separating that last subcycle. But if not, then I still prefer we not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in principle with getting rid of if checks in loops. They can make a big difference in performance, but it depends on the loop size and things like how long the pipeline is for alternative commands, which probably varies a good bit across hardware/compiler configurations. Maybe it's gotten better with newer generations. The other considerations here are not having duplicative code (which I also agree with) and making close-but-not-quite-duplicated code for various dynamics options as similar as possible. Since the dynamics is typically the most expensive part of the sea ice simulation, and centers seem to be moving toward higher numbers of subcycles for better convergence, then optimizing these loops seems wise. It's hard to judge, though, if the existing timing tests are inconclusive. Question: in the original code, the deformations routine is called at the end, rather than during the loop. Is the argument for moving it up into the loop (or interrupting the loop) that the computed deformations need to be consistent with the velocities just prior to the last iteration? Or is it not better that the deformations be computed from the final velocities? (Maybe I'm not understanding something here?)
Two more things:
|
I will check the error |
3b48d48
to
c833f90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try to converge quickly whether additional changes are needed so we can get this merged. My main concern continues to be the splitting out of the last subcycle in the loop for "potential" performance reasons. There is little evidence this helps performance, I wouldn't expect it to help performance as the if is at the level of the subcycling (not deeper), and I think it makes the code more difficult to maintain.
if (ndte > 1) then | ||
do ksub = 1,ndte-1 ! subcycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be true that the code will stop? Compiler warnings maybe, but setting loop values so a loop won't be executed is done all the time and should be allowed in general. I'm not opposed to the if check, it's not going to affect performance, but I think it's cleaner not being there. Where is the check for ndte<1. It's not in this part of the code, so the if check is needed even less.
if (ndte > 1) then | ||
do ksub = 1,ndte-1 ! subcycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a strong motivator is to get rid of if checks in loops. I assume this is for performance. My question continues to be whether we have measured a performance benefit with these changes. If not, then I prefer that we not replicate the calls for the last subcycle step in order to add the deformations call outside an if statement. My suspicion is that the if statement in the subcycling loop is not going to affect performance. Getting rid of if statements inside an ij loop, ok maybe, but not at this high a level in the computations. In my opinion, it's easier to maintain if the last iteration is not separated/duplicated. Maybe there are other reasons for separating that last subcycle. But if not, then I still prefer we not do it.
The loop splitting can be avoided if the call to deformations is delayed to after the last stepu. |
For info
|
@@ -176,7 +176,7 @@ subroutine init_dyn (dt) | |||
if (trim(coriolis) == 'constant') then | |||
fcor_blk(i,j,iblk) = 1.46e-4_dbl_kind ! Hibler 1979, N. Hem; 1/s | |||
else if (trim(coriolis) == 'zero') then | |||
fcor_blk(i,j,iblk) = 0.0 | |||
fcor_blk(i,j,iblk) = c0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change in this line the entire reason the code is not BFB for cor=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is why I put in the comment. I dont think that I reran a test with coriolis=0.0 after the last commit but just a warning if the standard test complains about bit for bit differences.
All other test have been bit for bit with main after Tony's OMP updates.
.
strintx , & ! divergence of internal ice stress, x (N/m^2) | ||
strinty , & ! divergence of internal ice stress, y (N/m^2) | ||
strairx , & ! stress on ice by air, x-direction | ||
strairy ! stress on ice by air, y-direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave them there, but commented out.
if (ndte > 1) then | ||
do ksub = 1,ndte-1 ! subcycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in principle with getting rid of if checks in loops. They can make a big difference in performance, but it depends on the loop size and things like how long the pipeline is for alternative commands, which probably varies a good bit across hardware/compiler configurations. Maybe it's gotten better with newer generations. The other considerations here are not having duplicative code (which I also agree with) and making close-but-not-quite-duplicated code for various dynamics options as similar as possible. Since the dynamics is typically the most expensive part of the sea ice simulation, and centers seem to be moving toward higher numbers of subcycles for better convergence, then optimizing these loops seems wise. It's hard to judge, though, if the existing timing tests are inconclusive. Question: in the original code, the deformations routine is called at the end, rather than during the loop. Is the argument for moving it up into the loop (or interrupting the loop) that the computed deformations need to be consistent with the velocities just prior to the last iteration? Or is it not better that the deformations be computed from the final velocities? (Maybe I'm not understanding something here?)
Deformations: It was a part of the stress routine. Therefore the loop would be |
Or it could have an if in the loop. The cost of stress and stepu are significantly greater than the if check in the subcycling loop. I'm in favor of getting rid of if checks inside deep, expensive loops, but I still suspect this "if" doesn't meet that criteria and never will. If ultimately we plan to refactor this so it's more efficient with memory and 1d/2d arrays and this change is needed for that, maybe this change should be implemented then. |
I'm willing to relent on my point about the duplication of calls and cost of the if statement and approve this PR. Are there other outstanding issues? @TillRasmussen @phil-blain @eclare108213 ? I need to know when it's ready to merge. |
I dont have more changes. |
approved (again) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it up to you, @apcraig and @eclare108213, to decide. I really do not like the duplicated calls in the EVP loop. I'm not convinced they impact the performance since it's a very "outer" loop.
@@ -906,7 +912,9 @@ subroutine seabed_stress_factor_LKD (nx_block, ny_block, & | |||
hwu = min(hwater(i,j),hwater(i+1,j),hwater(i,j+1),hwater(i+1,j+1)) | |||
|
|||
! if (hwu < threshold_hw) then | |||
docalc_tbu = max(sign(c1,threshold_hw-hwu),c0) | |||
! docalc_tbu = max(sign(c1,threshold_hw-hwu),c0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this commented line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. I also squashed the last three commits that were review related
… bit for bit if cor=0.0 added legacy comment in ice_dyn_finish
1743719
to
7376807
Compare
It seems that the prevailing sentiment is to keep the calls compact with an if statement in the loop, at least for now. I am leaning that way myself, but am somewhat ambivalent, as noted above. @TillRasmussen one of your main reasons for making this change in the loop is to make it more similar to the 1D evp case - is that right? If so, how much does that case degrade if you do it this way instead? |
I assume that you refer to the placement of the deformation call. Yes converging towards the 1d solver was part of it. The other part was performance but as the work with the 1d solver suggest then I think that the real bottleneck here is the communication calls. I browsed through a number of the test runs and it is often 1/3 of the total run time. Memory issues may be hidden for now in these small test cases with gx3 and gx1. At the same time many of the variables within the evp solver are really local to ice_dyn_evp and ice_dyn_shared and only used in either loops for ucells or tcells. These could with benefit be converted to 1d (or 2d if block structure is maintained). Performance of 1d and multiple calls to deformation My preferred option would be to verify whether deformations could be moved to after the last call to stepu. My second option would be to leave it as is but until I get around to clean-up the 1d solver to include deformation and strain rates then it is not that important and it may very well be true that the difference in performance is not that big. |
Does as-is mean the original coding order, or the way it is in this PR? |
As in the PR. |
Okay, I see. I'm going to make an executive decision, to keep this moving forward: please call deformations within the loop between stress and stepu, with an if statement. That seems to be the closest to the original code, and we can change it later once we all agree on the best way to do it. I do agree that the preferred option would be to verify that deformations could simply be moved outside of the loop, without calling stress and stepu again. I suspect that it can, but let's put that in an issue for a future PR. Maybe it could be tested as part of the 1D cleanup and fixed appropriately in both approaches. |
Thank you @TillRasmussen ! Let us know when you're ready for this to be merged (tested, etc). |
I think that it is fine now. I ran through one test and it was bit for bit. I can also run it through all ~1500 test that are currently setup, however this will be ready in 3-4 hours, thus that would be tomorrow |
Thank you @TillRasmussen for the latest changes. I think it's great to get the deformations call out of stress and to have the current implementation. I was just looking at vp and eap and they seem to handle the deformations call differently (eap doesn't seem to call it at all), so it shouldn't affect the other methods. I also had a quick peak at the 1d solver again. It's quite different structurally. I don't know how important it is to try to make the 1d and 2d more alike. I guess one thing that might be nice would be to find a way to use the same code to compute the underlying science, but I don't think that's going to be possible given the very different memory structure of the two versions. To me, that's the bigger risk, that someone updates the 2d implementation and not the 1d version, so the solutions then diverge. Should we be considering how to address that issue? As a sanity check, I will run a test suite with the latest code on cheyenne to check bit-for-bit. It it passes, I will approve the PR and then merge it (unless anyone else has any concerns and expresses them here). Thanks. |
@apcraig They are not that different as they appear. The issue is with |
* update icepack, rename snwITDrdg to snwitdrdg (CICE-Consortium#658) * Change max_blocks for rake tests on izumi (nothread). (CICE-Consortium#665) * Fix some raketests for izumi * fix some rake tests * Makefile: make Fortran object files depend on their dependency files (CICE-Consortium#667) When 'make' is invoked on the CICE Makefile, the first thing it does is to try to make the included dependency files (*.d) (which are in fact Makefiles themselves) [1], in alphabetical order. The rule to make the dep files have the dependency generator, 'makdep', as a prerequisite, so when processing the first dep file, make notices 'makdep' does not exist and proceeds to build it. If for whatever reason this compilation fails, make will then proceed to the second dep file, notice that it recently tried and failed to build its dependency 'makdep', give up on the second dep file, proceed to the third, and so on. In the end, no dep file is produced. Make then restarts itself and proceeds to build the code, which of course fails catastrophically because the Fortran source files are not compiled in the right order because the dependency files are missing. To avoid that, add a dependency on the dep file to the rules that make the object file out of the Fortran source files. Since old-fashioned suffix rules cannot have their own prerequisites [2], migrate the rules for the Fortran source files to use pattern rules [3] instead. While at it, also migrate the rule for the C source files. With this new dependency, the builds abort early, before trying to compile the Fortran sources, making it easier to understand what has gone wrong. Since we do not use suffix rules anymore, remove the '.SUFFIXES' line that indicates which extension to use suffix rules for (but keep the line that eliminates all default suffix rules). [1] https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html [2] https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html [3] https://www.gnu.org/software/make/manual/html_node/Pattern-Rules.html#Pattern-Rules * Fix multi-pe advection=none bug (CICE-Consortium#664) * update parsing scripts to improve robustness, fix multi-pe advection=none * Update cice script to improve performance including minor refactoring of parse_namelist and parse_settings to reduce cost and ability to use already setup ice_in file from a prior case in the suite. Added commented out timing ability in cice.setup. Change test default to PEND from FAIL. * fix cice.setup for case * add sedbak implementation to support Mac sed * s/spend/spent * nuopc/cmeps driver updates (CICE-Consortium#668) * add debug_model feature * add required variables and calls for tr_snow * Main namelist debug (CICE-Consortium#671) * Adding method to write erroneous namelist options * Remove erroneous comma in abort_ice for namelist check * Added check for zbgc_nml. I missed that namelist in this file. * Added space and colons for namelist error output * Added space and colons for namelist error output Co-authored-by: David A. Hebert <dhebert@nrlssc.navy.mil> * NUOPC/CMEPS cap updates (CICE-Consortium#670) * updated orbital calculations needed for cesm * fixed problems in updated orbital calculations needed for cesm * update CICE6 to support coupling with UFS * put in changes so that both ufsatm and cesm requirements for potential temperature and density are satisfied * update icepack submodule * Revert "update icepack submodule" This reverts commit e70d1ab. * update comp_ice.backend with temporary ice_timers fix * Fix threading problem in init_bgc * Fix additional OMP problems * changes for coldstart running * Move the forapps directory * remove cesmcoupled ifdefs * Fix logging issues for NUOPC * removal of many cpp-ifdefs * fix compile errors * fixes to get cesm working * fixed white space issue * Add restart_coszen namelist option * Update NUOPC cap to work with latest CICE6 master * nuopc,cmeps or s2s build updates * fixes for dbug_flag * Update nuopc2 to latest CICE master * Fix some merge problems * Fix dbug variable * Manual merge of UFS changes * fixes to get CESM B1850 compset working * refactored ice_prescribed_mod.F90 to work with cdeps rather than the mct data models * Fix use_restart_time * changes for creating masks at runtime * added ice_mesh_mod * implemented area correction factors as option * more cleanup * Fix dragio * Fix mushy bug * updates to nuopc cap to resolve inconsistency between driver inputs and cice namelists * changed error message * added icepack_warnings_flush * updates to get ice categories working * updates to have F compset almost working with cice6 - still problems in polar regions - need to resolve 253K/cice6 versus 273K/cice5 differences * changed tolerance of mesh/grid comparison * added issues raised in PR * Update CESM-CICE sync with new time manager * Add back in latlongrid * Add new advanced snow physics to driver * Fix restart issue with land blocks * Update mesh check in cap * fix scam problems * reintroduced imesh_eps check * Put dragio in the namelist instead * Remove redundant code * Fix some indents Co-authored-by: Mariana Vertenstein <mvertens@ucar.edu> Co-authored-by: apcraig <anthony.p.craig@gmail.com> Co-authored-by: Denise Worthen <denise.worthen@noaa.gov> * Add CESM1_PIO for fill value check (CICE-Consortium#675) * Add CESM1_PIO for fill value check * Revert PIO_FILL_DOUBLE change for now * - Update the namelist read to make the group order flexible. (CICE-Consortium#677) - Remove recent update to namelist read that traps bad lines, it conflicts with flexibility to read groups in random order picked up by NAG. - change print* statements to write(nu_diag,*) * Port to Narwhal and add perf_suite (CICE-Consortium#678) * Add narwhal intel, gnu, cray, aocc Add perf_suite.ts * update narwhal_cray and perf_suite * Update OMP (CICE-Consortium#680) * Add narwhal intel, gnu, cray, aocc Add perf_suite.ts * update narwhal_cray and perf_suite * Review and update OMP implementation - Fix call to timers in block loops - Fix some OMP Private variables - Test OMP Scheduling, add SCHEDULE(runtime) to some OMP loops - Review column and advection OMP implementation - ADD OMP_TIMERS CPP option (temporary) to time threaded sections - Add timer_tmp timers (temporary) - Add omp_suite.ts test suite - Add ability to set OMP_SCHEDULE via options (ompscheds, ompscheds1, ompschedd1) * - Review diagnostics OMP implementation - Add timer_stats namelist to turn on extra timer output information - Add ICE_BFBTYPE and update bit-for-bit comparison logic in scripts - Update qc and logbfb testing - Remove logbfb and qchkf tests, add cmplog, cmplogrest, cmprest set_env files to set ICE_BFBTYPE - Update documentation * Update EVP OMP implementation * - Refactor puny/pi scalars in eap dynamics to improve performance - Update OMP in evp and eap * Clean up * Comment out temporary timers * Update OMP env variables on Narwhal * Update gaffney OMP_STACKSIZE * update OMP_STACKSIZE on cori * Update Onyx OMP_STACKSIZE Update documentation * Update OMP_STACKSIZE on mustang * - Update Tsfc values on land in various places in the code, was affecting testing. Specifically fix upwind advection. - Comment out OMP in ice_dyn_evp_1d_kernel, was producing non bit-for-bit results with different thread counts * updating LICENSE.pdf for 2022 * seabed stress - remove if statements (CICE-Consortium#673) * refactor seabed_stress. Bit for bit * Removed if statement from stepu. Results are binary identical, however taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration * changed capping from logical to numeric in order to remove if statement. Moved call to deformation out of loop * clean dyn_finish, correct intent(inout) to intent(in) for Cw, resse Cb in stepu, remove if from seabed_stress_LKD * Reolve conflicts after updating main * modified environment for Freya to accomodate for additional OMP commands * Requested changes after review. Only changed in seabed stress and not bit for bit if cor=0.0 added legacy comment in ice_dyn_finish * move deformation to subcycling * - Update version and copyright. (CICE-Consortium#691) - Remove gordon and conrad machines. - Add setenv OMP_STACKSIZE commented out in env files - Update Icepack to fc4b809 * add OMP_STACKSIZE for koehr (CICE-Consortium#693) * Update C/CD deformations calls to be consistent with main B changes Update tauxbx, tauxby calculations on C/CD to be consistent with main B changes * Update OpenMP in C/CD implementation Extend omp_suite to include C/CD tests * reconcile recent merge problem * set default value of capping to 0. in vp cases for backwards compatibility * Set capping to 1.0 in vp consistent with evp, changes answers for vp configurations Co-authored-by: David A. Bailey <dbailey@ucar.edu> Co-authored-by: Philippe Blain <philippe.blain@canada.ca> Co-authored-by: Denise Worthen <denise.worthen@noaa.gov> Co-authored-by: daveh150 <david.hebert@nrlssc.navy.mil> Co-authored-by: David A. Hebert <dhebert@nrlssc.navy.mil> Co-authored-by: Mariana Vertenstein <mvertens@ucar.edu> Co-authored-by: Elizabeth Hunke <eclare@lanl.gov> Co-authored-by: TRasmussen <33480590+TillRasmussen@users.noreply.github.com>
For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers
PR checklist
[x] Bit for bit
Timings show that some test are faster others are slow compared to current main. Results are bit for bit.
Moved if statements outside of loops within stepu and calculation of seabed stress. the removal within stepu results in taubx and tauby are calculated at all time steps and not just the final.
EDIT: Moved deformation out out stress calculations. Separated last evp iterations in order to only calculate when needed and to avoid if statement