seabed stress - remove if statements #673

TillRasmussen · 2021-12-12T21:57:05Z

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

…r taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration

eclare108213

Of course this is faster! Thanks for refactoring it.

phil-blain · 2021-12-13T17:28:44Z

Timings show that some test are faster others are slow compared to current main.

I think it would be good if we could get a sense of which tests are faster, and which are slower, and by what kind of amount. I'm guessing OpenMP cases will be faster.

TillRasmussen · 2021-12-13T18:02:53Z

Yes it is faster on some of the openmpi test. This is primarily the change in stepu as this has been run as a "standard test". The result is that the change in the seabed stress is only called in 2 test with intel and 2 test with gnu. I have uploaded the results.log
results.log

apcraig · 2021-12-13T19:30:32Z

It looks like cases with alt01, alt03, gx1prod, and seabed all have seabed_stress on. Looking at @TillRasmussen's results, I'm not sure anything conclusive can be said about performance changes. If anything, this change looks a little slower in general. Are these change being done to improve performance? If so, I think we need more evidence that it's useful. If these changes are to improve code readability, flexibility, or some other reason, then that's fine too. Lets make sure we know why we're making the change and agree it makes sense.

eclare108213 · 2021-12-13T21:03:50Z

It's hard to tell -- could be a toss-up. A lot of the differences, both + and -, are just in the noise of machine load from one run to the next. Running the entire suite again (including the baseline) could help narrow that uncertainty down. There are a few runs, though, that are more than 20% different and seem to me to be mostly indicating that the new version is faster, but maybe not. These would show up in @apcraig 's green/yellow/red timing colors on the testing page. @TillRasmussen can you upload the test results?

A notable exception - what's going on here?

PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin run 34.68 1.07 1.66
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin test
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin bfbcomp freya_intel_decomp_gx3_4x2x25x29x5_squarepop
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin compare neweapbase 3.97 0.83 1.58

Let's make sure that I'm interpreting the output correctly... the top line here shows the new version timings, and the last line shows the baseline timings, for the timeloop, dynamics and column timers?

TillRasmussen · 2021-12-14T06:35:26Z

Performance was the motivation. I think that theoretically it should be faster as If statements within do loops should be avoided, especially when they do not depend on i,j etc. There are still if statements within the seabed_stress functions.
For the change in stepu the calculation will add additional computations, however not many.
One reason why it has not changed that much is probably that there are still memory issues and other things larger than this.
Another is the relatively small areas we run as test.

The variability can be the DMI HPC system, we see op to 50 % differences in the run time in the operational setup primarily due to high load on the hard disk (90%+) and wrong usage of a system like that. If you see the example that @eclare108213 then it is not the dynamics nor the thermodynamics that makes the difference

TillRasmussen · 2021-12-14T10:53:04Z

For the reference I reran the example that Elizabeth extracted with the following result
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin run 4.69 1.28 1.66
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin test
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin bfbcomp freya_intel_decomp_gx3_4x2x25x29x5_squarepop
PASS freya_intel_decomp_gx3_4x2x25x29x5_sectrobin compare neweapbase 3.97 0.83 1.58

…nt. Moved call to deformation out of loop

TillRasmussen · 2021-12-20T14:13:44Z

A few questions remains:
Deformations calculates 4 arrays at the last iteration of the time stepping. The separation of the last sub cycling iteration and the rest could be eliminated if calculations of deformations is moved to after the last call to stepu. This will cause a small difference in the velocities used for calculation of the deformations but will not influence other results.
Is this acceptable?
Similar change could be done with sea_bed stress .
Can anyone suggest a way to copy the halo_info to halo_info_mask. This would remove the last if statement within the iteration.
Timings can be found in this attachements., The results are still a bit blury.
results.log
A motivation for doing this is to move it to be as similar to the 1d solver without the limiation of no MPI

TillRasmussen · 2021-12-20T19:44:13Z

Timings show that some test are faster others are slow compared to current main.

I think it would be good if we could get a sense of which tests are faster, and which are slower, and by what kind of amount. I'm guessing OpenMP cases will be faster.

Note OpenMP is turned off in the evp subcycling

…b in stepu, remove if from seabed_stress_LKD

eclare108213 · 2022-02-09T01:34:46Z

Is this one ready to be merged? Still waiting for reviews from @JFLemieux73 @phil-blain ?

phil-blain · 2022-02-10T20:51:51Z

I don't think it's ready for merging, I'd like to take a closer look and maybe @JFLemieux73 would too. It's not clear to me either if it's still bit for bit ? was the base_suite run with the lastest commit on this branch against main ?

TillRasmussen · 2022-02-11T09:32:28Z

At the time of testing (December) it was bit for bit with the main trunk.
It ran all the following suites.
first_suite,base_suite,travis_suite,decomp_suite,reprosum_suite,quick_suite

@apcraig has created a pull request for OMP. Maybe this should be accepted then I can integrate these changes aftwerwards and resubmit the test.

phil-blain · 2022-02-18T16:08:23Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

+      real (kind=dbl_kind), dimension (nx_block,ny_block), intent(in) :: &
         Cw                   ! ocean-ice neutral drag coefficient


nice cleanup

phil-blain

Thanks for this work @TillRasmussen, I think we are going in the right direction with these changes. I left a few comments.

Also, thanks for crafting separate commits that are easy to review separately, this is very much apprciated.

Commit 3/4 (changed capping from logical to numeric...) could still be splitted even more I think, as it seems to do several distinct things:

remove ksub from stepu's interface and adjust callers
remove ksub, tarear, shear, divu, rdg_conv and rdg_shear from stress's interface and adjust callers, and moving the call to deformations on frame up.
decrease the do ksub loop ending index by 1 in evp and add additional separate calls to stress, stepu , etc after the loop
change capping to real in viscous_coeffs_and_rep_pressure and calc_zeta_dPr

phil-blain · 2022-02-18T16:09:15Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

+         ! calculate seabed stress component for outputs 
+         ! only needed on last iteration.
+         taubx(i,j) = -uvel(i,j)*Cb
+         tauby(i,j) = -vvel(i,j)*Cb


if it's faster to compute them each time than having an if here, then this indeed makes sense.

If statements should in general be avoided and the extra calculations are limited.. Timings are not conclusive. I would like to close this issue as it is above. Next step would be to vectorize uvel_init,, vvel_init and Tbu. These are alle "local" to ice_dyn_shared and only used in stepu. With these changes I would like to see if it makes sense to convert cb to a vector and calculate tauby and taubx after the last call to stepu just as well as deformations. An addiotonal benefit is that this align 1d solver and the 2d solver.

phil-blain · 2022-02-18T16:17:51Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

+!         if (hwu < threshold_hw) then
+         docalc_tbu = max(sign(c1,threshold_hw-hwu),c0)


maybe this would be easier to read and understand with the merge intrinsic (Intel doc, GFortran doc), which is like a ternary operator:

docalc_tbu = merge(c1, c0, hwu < threshold_hw)

Done. Will be in a commit a bit later.

phil-blain · 2022-02-18T16:41:24Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

-         strintx , & ! divergence of internal ice stress, x (N/m^2)
-         strinty , & ! divergence of internal ice stress, y (N/m^2)
-         strairx , & ! stress on ice by air, x-direction
-         strairy     ! stress on ice by air, y-direction


I'm not sure about those deletions. The computations for Hibler-Bryan stresses are still commented in the subroutine, so if we ever add them back and test them, we would have to add these four arguments back in.

So either we also remove the commented code, or we keep as-is. I do not think that having extra arguments impact performance, as Fortran uses pass-by-reference for arrays, So it should have no impact to have extra arguments passed...

I agree that they should be removed or reimplemented, however I dont have an opininion on which is the right to do, however the Hibler-Bryan stresses has been commented out as long as I remember. I would remove the commented code. @eclare108213 any opinion?

I would leave them there, but commented out.

phil-blain · 2022-02-18T16:47:54Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

+         taubx(i,j) = -uvel(i,j)*Cb
+         tauby(i,j) = -vvel(i,j)*Cb


thanks, I thought we had made this change already somewhere...

phil-blain · 2022-02-18T16:51:48Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

@@ -1401,42 +1399,33 @@ subroutine viscous_coeffs_and_rep_pressure (strength,  tinyarea, &

      ! NOTE: for comp. efficiency 2 x zeta and 2 x eta are used in the code

-!      if (trim(yield_curve) == 'ellipse') then


I would prefer we keep this commented if in, @JFLemieux73 plans to eventually add more yield curves, I think.

I reinserted it

phil-blain · 2022-02-18T16:54:00Z

cicecore/cicedynB/dynamics/ice_dyn_evp.F90

+      !-----------------------------------------------------------------
+      ! on last subcycle, save quantities for mechanical redistribution
+      !-----------------------------------------------------------------
+            call deformations (nx_block  , ny_block  , &
+                               icellt(iblk)    ,             &
+                               indxti(:,iblk)    , indxtj(:,iblk)    , &
+                               uvel(:,:,iblk)      , vvel(:,:,iblk)      , &
+                               dxt(:,:,iblk)       , dyt(:,:,iblk)       , &
+                               cxp(:,:,iblk)       , cyp(:,:,iblk)       , &
+                               cxm(:,:,iblk)       , cym(:,:,iblk)       , &
+                               tarear(:,:,iblk)    ,             &
+                               shear(:,:,iblk)     , divu(:,:,iblk)      , &
+                               rdg_conv(:,:,iblk)  , rdg_shear(:,:,iblk) )
+


I think it's way cleaner to compute the deformations here, good idea. However, can we keep the & and the commas aligned ?

phil-blain · 2022-02-18T16:55:48Z

cicecore/cicedynB/dynamics/ice_dyn_evp.F90

+         if (ndte > 1) then 
+            do ksub = 1,ndte-1       ! subcycling


I don't understand why this change is necessary.

If ndte=1 then some compilers will create warnings and maybe even stop due to a do loop that runs from 1 to 0. The same should be included in the 1d .Now it aborts if ndte < 1

It can't be true that the code will stop? Compiler warnings maybe, but setting loop values so a loop won't be executed is done all the time and should be allowed in general. I'm not opposed to the if check, it's not going to affect performance, but I think it's cleaner not being there. Where is the check for ndte<1. It's not in this part of the code, so the if check is needed even less.

It seems a strong motivator is to get rid of if checks in loops. I assume this is for performance. My question continues to be whether we have measured a performance benefit with these changes. If not, then I prefer that we not replicate the calls for the last subcycle step in order to add the deformations call outside an if statement. My suspicion is that the if statement in the subcycling loop is not going to affect performance. Getting rid of if statements inside an ij loop, ok maybe, but not at this high a level in the computations. In my opinion, it's easier to maintain if the last iteration is not separated/duplicated. Maybe there are other reasons for separating that last subcycle. But if not, then I still prefer we not do it.

I agree in principle with getting rid of if checks in loops. They can make a big difference in performance, but it depends on the loop size and things like how long the pipeline is for alternative commands, which probably varies a good bit across hardware/compiler configurations. Maybe it's gotten better with newer generations. The other considerations here are not having duplicative code (which I also agree with) and making close-but-not-quite-duplicated code for various dynamics options as similar as possible. Since the dynamics is typically the most expensive part of the sea ice simulation, and centers seem to be moving toward higher numbers of subcycles for better convergence, then optimizing these loops seems wise. It's hard to judge, though, if the existing timing tests are inconclusive. Question: in the original code, the deformations routine is called at the end, rather than during the loop. Is the argument for moving it up into the loop (or interrupting the loop) that the computed deformations need to be consistent with the velocities just prior to the last iteration? Or is it not better that the deformations be computed from the final velocities? (Maybe I'm not understanding something here?)

phil-blain · 2022-02-18T18:44:07Z

Two more things:

It would be nice to wrap commits messages at 76 chars or so. Vim/Emacs should do this automatically, if you let Git open them (git commit and not git commit -m "message")
Once feedback is adressed and the merge conflicts resolved, I think it would be great to see updated tests (check that everything is bit 4 bit) and also timings.

TillRasmussen · 2022-02-19T18:42:16Z

I will check the error

apcraig

Lets try to converge quickly whether additional changes are needed so we can get this merged. My main concern continues to be the splitting out of the last subcycle in the loop for "potential" performance reasons. There is little evidence this helps performance, I wouldn't expect it to help performance as the if is at the level of the subcycling (not deeper), and I think it makes the code more difficult to maintain.

apcraig · 2022-02-21T18:03:17Z

cicecore/cicedynB/dynamics/ice_dyn_evp.F90

+         if (ndte > 1) then 
+            do ksub = 1,ndte-1       ! subcycling


It can't be true that the code will stop? Compiler warnings maybe, but setting loop values so a loop won't be executed is done all the time and should be allowed in general. I'm not opposed to the if check, it's not going to affect performance, but I think it's cleaner not being there. Where is the check for ndte<1. It's not in this part of the code, so the if check is needed even less.

apcraig · 2022-02-21T18:06:35Z

cicecore/cicedynB/dynamics/ice_dyn_evp.F90

+         if (ndte > 1) then 
+            do ksub = 1,ndte-1       ! subcycling


It seems a strong motivator is to get rid of if checks in loops. I assume this is for performance. My question continues to be whether we have measured a performance benefit with these changes. If not, then I prefer that we not replicate the calls for the last subcycle step in order to add the deformations call outside an if statement. My suspicion is that the if statement in the subcycling loop is not going to affect performance. Getting rid of if statements inside an ij loop, ok maybe, but not at this high a level in the computations. In my opinion, it's easier to maintain if the last iteration is not separated/duplicated. Maybe there are other reasons for separating that last subcycle. But if not, then I still prefer we not do it.

TillRasmussen · 2022-02-21T19:14:07Z

The loop splitting can be avoided if the call to deformations is delayed to after the last stepu.
This will update u, which I assume result in a minor update.
This will cause minor differences to output (divu and shear) and to ridging in icepack through rdg_conv and rdg_shear. Whether this is critical or not I cannot say, but I would think that it is limited
the if (ndte > 1) then is on line 393.
This was put into the 1d solver and I cannot remember if it was in order to avoid warnings or in order to avoid a crash.

TillRasmussen · 2022-02-21T19:21:29Z

In terms of perfomance then that is the reason for moving if statements away form loops, however this has not helped as much as desired. This indicates other issues that were solved in the 1d solver (memory and communication). These modifications are also steps towards reducing the numnber of 2d arrays within the loop as deformations has been moved out.
This will also improve the maintainance of the 1d solver as the stress_last can be excluded and the convertions between 2d and 1d can be limited.
The test I made on the DMI computer varies in both directions in terms of performance, however especially IO varies on this machine.

TillRasmussen · 2022-02-21T20:32:15Z

For info
results_base.log
For the last commit I changed a line in seabedLKD but nothing in the probabilistic. I compared timings for these just to illustrate the changes runnning more or less the same. Therefore please compare column 2.
freyja-5 CICE/testsuite.neweap190222> diff results.log results_base.log | grep run
< PASS freya_intel_restart_gx1_16x2_debug_gx1apr_seabedLKD_short run 725.68 447.32 144.01

PASS freya_intel_restart_gx1_16x2_debug_gx1apr_seabedLKD_short run 712.58 420.29 147.39
< PASS freya_intel_restart_gx1_15x2_seabedprob run 173.11 31.79 21.84
PASS freya_intel_restart_gx1_15x2_seabedprob run 90.58 17.91 21.45
< PASS freya_intel_smoke_gx1_15x2_reprosum_run10day_seabedprob build
< PASS freya_intel_smoke_gx1_15x2_reprosum_run10day_seabedprob run 260.26 48.42 43.47
COPY freya_intel_smoke_gx1_15x2_reprosum_run10day_seabedprob build
PASS freya_intel_smoke_gx1_15x2_reprosum_run10day_seabedprob run 200.16 35.83 42.05
< PASS freya_intel_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread build
< PASS freya_intel_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread run 118.61 8.66 20.47
COPY freya_intel_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread build
PASS freya_intel_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread run 96.35 8.58 20.57
< PASS freya_gnu_restart_gx1_16x2_debug_gx1apr_seabedLKD_short run 165.58 46.26 41.10
PASS freya_gnu_restart_gx1_16x2_debug_gx1apr_seabedLKD_short run 132.29 46.11 41.19
< PASS freya_gnu_restart_gx1_15x2_seabedprob run 96.85 6.87 17.43
PASS freya_gnu_restart_gx1_15x2_seabedprob run 67.80 6.87 17.41
< PASS freya_gnu_smoke_gx1_15x2_reprosum_run10day_seabedprob build
< PASS freya_gnu_smoke_gx1_15x2_reprosum_run10day_seabedprob run 178.83 13.79 34.80
COPY freya_gnu_smoke_gx1_15x2_reprosum_run10day_seabedprob build
PASS freya_gnu_smoke_gx1_15x2_reprosum_run10day_seabedprob run 88.72 13.84 34.75
< PASS freya_gnu_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread build
< PASS freya_gnu_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread run 143.34 12.84 28.25
COPY freya_gnu_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread build
PASS freya_gnu_smoke_gx1_28x1_cmplogrest_reprosum_run10day_seabedprob_thread run 79.88 12.89 28.31

eclare108213 · 2022-02-21T20:34:00Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

@@ -176,7 +176,7 @@ subroutine init_dyn (dt)
         if (trim(coriolis) == 'constant') then
            fcor_blk(i,j,iblk) = 1.46e-4_dbl_kind ! Hibler 1979, N. Hem; 1/s
         else if (trim(coriolis) == 'zero') then
-            fcor_blk(i,j,iblk) = 0.0
+            fcor_blk(i,j,iblk) = c0


Is the change in this line the entire reason the code is not BFB for cor=0?

yes that is why I put in the comment. I dont think that I reran a test with coriolis=0.0 after the last commit but just a warning if the standard test complains about bit for bit differences.
All other test have been bit for bit with main after Tony's OMP updates.
.

eclare108213 · 2022-02-21T20:39:19Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

-         strintx , & ! divergence of internal ice stress, x (N/m^2)
-         strinty , & ! divergence of internal ice stress, y (N/m^2)
-         strairx , & ! stress on ice by air, x-direction
-         strairy     ! stress on ice by air, y-direction


I would leave them there, but commented out.

eclare108213 · 2022-02-21T20:58:43Z

cicecore/cicedynB/dynamics/ice_dyn_evp.F90

+         if (ndte > 1) then 
+            do ksub = 1,ndte-1       ! subcycling


I agree in principle with getting rid of if checks in loops. They can make a big difference in performance, but it depends on the loop size and things like how long the pipeline is for alternative commands, which probably varies a good bit across hardware/compiler configurations. Maybe it's gotten better with newer generations. The other considerations here are not having duplicative code (which I also agree with) and making close-but-not-quite-duplicated code for various dynamics options as similar as possible. Since the dynamics is typically the most expensive part of the sea ice simulation, and centers seem to be moving toward higher numbers of subcycles for better convergence, then optimizing these loops seems wise. It's hard to judge, though, if the existing timing tests are inconclusive. Question: in the original code, the deformations routine is called at the end, rather than during the loop. Is the argument for moving it up into the loop (or interrupting the loop) that the computed deformations need to be consistent with the velocities just prior to the last iteration? Or is it not better that the deformations be computed from the final velocities? (Maybe I'm not understanding something here?)

TillRasmussen · 2022-02-21T21:17:24Z

Deformations: It was a part of the stress routine. Therefore the loop would be
ksub=1,ndte
call stress; call deformation; call stepu
end
This version keep the order of calls but do not calculate deformation until last iteration, thus
ksub=1,ndte-1
call stress; call stepu
end
call stress;call deformation;call stepu;
I am not sure if it is important to calculate deformations before or after the last call to stepu. I assume that it is placed where it is as it is calculated on the tcells.
A potential rework would be
ksub=1,ndte
call stress; call stepu
end
call deformation

apcraig · 2022-02-22T00:46:15Z

Deformations: It was a part of the stress routine. Therefore the loop would be ksub=1,ndte call stress; call deformation; call stepu end This version keep the order of calls but do not calculate deformation until last iteration, thus ksub=1,ndte-1 call stress; call stepu end call stress;call deformation;call stepu; I am not sure if it is important to calculate deformations before or after the last call to stepu. I assume that it is placed where it is as it is calculated on the tcells. A potential rework would be ksub=1,ndte call stress; call stepu end call deformation

Or it could have an if in the loop. The cost of stress and stepu are significantly greater than the if check in the subcycling loop. I'm in favor of getting rid of if checks inside deep, expensive loops, but I still suspect this "if" doesn't meet that criteria and never will. If ultimately we plan to refactor this so it's more efficient with memory and 1d/2d arrays and this change is needed for that, maybe this change should be implemented then.

apcraig · 2022-02-22T00:47:47Z

I'm willing to relent on my point about the duplication of calls and cost of the if statement and approve this PR. Are there other outstanding issues? @TillRasmussen @phil-blain @eclare108213 ? I need to know when it's ready to merge.

TillRasmussen · 2022-02-22T07:36:48Z

I dont have more changes.

eclare108213 · 2022-02-22T15:41:40Z

approved (again)

phil-blain

I will leave it up to you, @apcraig and @eclare108213, to decide. I really do not like the duplicated calls in the EVP loop. I'm not convinced they impact the performance since it's a very "outer" loop.

phil-blain · 2022-02-22T15:46:35Z

cicecore/cicedynB/dynamics/ice_dyn_shared.F90

@@ -906,7 +912,9 @@ subroutine seabed_stress_factor_LKD (nx_block, ny_block,         &
         hwu = min(hwater(i,j),hwater(i+1,j),hwater(i,j+1),hwater(i+1,j+1))

 !         if (hwu < threshold_hw) then
-         docalc_tbu = max(sign(c1,threshold_hw-hwu),c0)
+!         docalc_tbu = max(sign(c1,threshold_hw-hwu),c0)


can we remove this commented line?

Removed. I also squashed the last three commits that were review related

… bit for bit if cor=0.0 added legacy comment in ice_dyn_finish

eclare108213 · 2022-02-22T16:11:23Z

It seems that the prevailing sentiment is to keep the calls compact with an if statement in the loop, at least for now. I am leaning that way myself, but am somewhat ambivalent, as noted above. @TillRasmussen one of your main reasons for making this change in the loop is to make it more similar to the 1D evp case - is that right? If so, how much does that case degrade if you do it this way instead?

TillRasmussen · 2022-02-22T17:33:46Z

I assume that you refer to the placement of the deformation call.

Yes converging towards the 1d solver was part of it. The other part was performance but as the work with the 1d solver suggest then I think that the real bottleneck here is the communication calls. I browsed through a number of the test runs and it is often 1/3 of the total run time. Memory issues may be hidden for now in these small test cases with gx3 and gx1.
Moving as much of the diagnostics (deformations) away from the subcyling has the additional benefit that the conversion from 2d to 1d and vice versa would be simpler as eg. divergence is only calculated once outside the loop, thus one could probably skip this transformation for this and most other variables in the 1d loop. This is done in copy_in2d_1d.

At the same time many of the variables within the evp solver are really local to ice_dyn_evp and ice_dyn_shared and only used in either loops for ucells or tcells. These could with benefit be converted to 1d (or 2d if block structure is maintained).

Performance of 1d and multiple calls to deformation
I have not tested it but I am sure that Jacob did so when writing it a few years back. This was why he made the interface structure that essentially separated a version of stress with deformations and one without. It was less obvious what could be moved from stress as the functions strain_rates and deformations did not exist.
I would be very hesitant to reintroduce deformations into the loop of the 1d, however one can always try.

My preferred option would be to verify whether deformations could be moved to after the last call to stepu.

My second option would be to leave it as is but until I get around to clean-up the 1d solver to include deformation and strain rates then it is not that important and it may very well be true that the difference in performance is not that big.

eclare108213 · 2022-02-22T18:17:19Z

My second option would be to leave it as is but until I get around to clean-up the 1d solver to include deformation and strain rates then it is not that important and it may very well be true that the difference in performance is not that big.

Does as-is mean the original coding order, or the way it is in this PR?

TillRasmussen · 2022-02-22T18:25:50Z

As in the PR.
The original code had deformations embedded into stress. I would not like to go back to that.
The 3rd alternative in my opinion is call deformations between stress and stepu.
That is just my preference :)

eclare108213 · 2022-02-22T18:32:00Z

Okay, I see. I'm going to make an executive decision, to keep this moving forward: please call deformations within the loop between stress and stepu, with an if statement. That seems to be the closest to the original code, and we can change it later once we all agree on the best way to do it. I do agree that the preferred option would be to verify that deformations could simply be moved outside of the loop, without calling stress and stepu again. I suspect that it can, but let's put that in an issue for a future PR. Maybe it could be tested as part of the 1D cleanup and fixed appropriately in both approaches.

eclare108213 · 2022-02-22T19:47:02Z

Thank you @TillRasmussen ! Let us know when you're ready for this to be merged (tested, etc).

TillRasmussen · 2022-02-22T19:53:27Z

I think that it is fine now. I ran through one test and it was bit for bit. I can also run it through all ~1500 test that are currently setup, however this will be ready in 3-4 hours, thus that would be tomorrow

apcraig · 2022-02-22T21:42:50Z

Thank you @TillRasmussen for the latest changes. I think it's great to get the deformations call out of stress and to have the current implementation. I was just looking at vp and eap and they seem to handle the deformations call differently (eap doesn't seem to call it at all), so it shouldn't affect the other methods. I also had a quick peak at the 1d solver again. It's quite different structurally. I don't know how important it is to try to make the 1d and 2d more alike. I guess one thing that might be nice would be to find a way to use the same code to compute the underlying science, but I don't think that's going to be possible given the very different memory structure of the two versions. To me, that's the bigger risk, that someone updates the 2d implementation and not the 1d version, so the solutions then diverge. Should we be considering how to address that issue?

As a sanity check, I will run a test suite with the latest code on cheyenne to check bit-for-bit. It it passes, I will approve the PR and then merge it (unless anyone else has any concerns and expresses them here). Thanks.

TillRasmussen · 2022-02-22T22:14:50Z

@apcraig They are not that different as they appear.
eg. Function stress
Geometrical parameters can be calculated as scalars as it is the case in the 1d solver
Function strain_rate could easily be rewritten to accept scalars as input.
viscous_coeffs_and_rep_pressure already do this
stress_iter and stress_last can be merged to one now that deformations are out of stress.

The issue is with
stressp and stressm are 1d/2d
str is split in 8 and it is in 1d/2d
where there is a 1d/2d issue.

apcraig · 2022-02-23T05:36:52Z

Testing looks good. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#38131f5bf4f5806567bb5b9811c504647042bd66

* update icepack, rename snwITDrdg to snwitdrdg (CICE-Consortium#658) * Change max_blocks for rake tests on izumi (nothread). (CICE-Consortium#665) * Fix some raketests for izumi * fix some rake tests * Makefile: make Fortran object files depend on their dependency files (CICE-Consortium#667) When 'make' is invoked on the CICE Makefile, the first thing it does is to try to make the included dependency files (*.d) (which are in fact Makefiles themselves) [1], in alphabetical order. The rule to make the dep files have the dependency generator, 'makdep', as a prerequisite, so when processing the first dep file, make notices 'makdep' does not exist and proceeds to build it. If for whatever reason this compilation fails, make will then proceed to the second dep file, notice that it recently tried and failed to build its dependency 'makdep', give up on the second dep file, proceed to the third, and so on. In the end, no dep file is produced. Make then restarts itself and proceeds to build the code, which of course fails catastrophically because the Fortran source files are not compiled in the right order because the dependency files are missing. To avoid that, add a dependency on the dep file to the rules that make the object file out of the Fortran source files. Since old-fashioned suffix rules cannot have their own prerequisites [2], migrate the rules for the Fortran source files to use pattern rules [3] instead. While at it, also migrate the rule for the C source files. With this new dependency, the builds abort early, before trying to compile the Fortran sources, making it easier to understand what has gone wrong. Since we do not use suffix rules anymore, remove the '.SUFFIXES' line that indicates which extension to use suffix rules for (but keep the line that eliminates all default suffix rules). [1] https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html [2] https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html [3] https://www.gnu.org/software/make/manual/html_node/Pattern-Rules.html#Pattern-Rules * Fix multi-pe advection=none bug (CICE-Consortium#664) * update parsing scripts to improve robustness, fix multi-pe advection=none * Update cice script to improve performance including minor refactoring of parse_namelist and parse_settings to reduce cost and ability to use already setup ice_in file from a prior case in the suite. Added commented out timing ability in cice.setup. Change test default to PEND from FAIL. * fix cice.setup for case * add sedbak implementation to support Mac sed * s/spend/spent * nuopc/cmeps driver updates (CICE-Consortium#668) * add debug_model feature * add required variables and calls for tr_snow * Main namelist debug (CICE-Consortium#671) * Adding method to write erroneous namelist options * Remove erroneous comma in abort_ice for namelist check * Added check for zbgc_nml. I missed that namelist in this file. * Added space and colons for namelist error output * Added space and colons for namelist error output Co-authored-by: David A. Hebert <dhebert@nrlssc.navy.mil> * NUOPC/CMEPS cap updates (CICE-Consortium#670) * updated orbital calculations needed for cesm * fixed problems in updated orbital calculations needed for cesm * update CICE6 to support coupling with UFS * put in changes so that both ufsatm and cesm requirements for potential temperature and density are satisfied * update icepack submodule * Revert "update icepack submodule" This reverts commit e70d1ab. * update comp_ice.backend with temporary ice_timers fix * Fix threading problem in init_bgc * Fix additional OMP problems * changes for coldstart running * Move the forapps directory * remove cesmcoupled ifdefs * Fix logging issues for NUOPC * removal of many cpp-ifdefs * fix compile errors * fixes to get cesm working * fixed white space issue * Add restart_coszen namelist option * Update NUOPC cap to work with latest CICE6 master * nuopc,cmeps or s2s build updates * fixes for dbug_flag * Update nuopc2 to latest CICE master * Fix some merge problems * Fix dbug variable * Manual merge of UFS changes * fixes to get CESM B1850 compset working * refactored ice_prescribed_mod.F90 to work with cdeps rather than the mct data models * Fix use_restart_time * changes for creating masks at runtime * added ice_mesh_mod * implemented area correction factors as option * more cleanup * Fix dragio * Fix mushy bug * updates to nuopc cap to resolve inconsistency between driver inputs and cice namelists * changed error message * added icepack_warnings_flush * updates to get ice categories working * updates to have F compset almost working with cice6 - still problems in polar regions - need to resolve 253K/cice6 versus 273K/cice5 differences * changed tolerance of mesh/grid comparison * added issues raised in PR * Update CESM-CICE sync with new time manager * Add back in latlongrid * Add new advanced snow physics to driver * Fix restart issue with land blocks * Update mesh check in cap * fix scam problems * reintroduced imesh_eps check * Put dragio in the namelist instead * Remove redundant code * Fix some indents Co-authored-by: Mariana Vertenstein <mvertens@ucar.edu> Co-authored-by: apcraig <anthony.p.craig@gmail.com> Co-authored-by: Denise Worthen <denise.worthen@noaa.gov> * Add CESM1_PIO for fill value check (CICE-Consortium#675) * Add CESM1_PIO for fill value check * Revert PIO_FILL_DOUBLE change for now * - Update the namelist read to make the group order flexible. (CICE-Consortium#677) - Remove recent update to namelist read that traps bad lines, it conflicts with flexibility to read groups in random order picked up by NAG. - change print* statements to write(nu_diag,*) * Port to Narwhal and add perf_suite (CICE-Consortium#678) * Add narwhal intel, gnu, cray, aocc Add perf_suite.ts * update narwhal_cray and perf_suite * Update OMP (CICE-Consortium#680) * Add narwhal intel, gnu, cray, aocc Add perf_suite.ts * update narwhal_cray and perf_suite * Review and update OMP implementation - Fix call to timers in block loops - Fix some OMP Private variables - Test OMP Scheduling, add SCHEDULE(runtime) to some OMP loops - Review column and advection OMP implementation - ADD OMP_TIMERS CPP option (temporary) to time threaded sections - Add timer_tmp timers (temporary) - Add omp_suite.ts test suite - Add ability to set OMP_SCHEDULE via options (ompscheds, ompscheds1, ompschedd1) * - Review diagnostics OMP implementation - Add timer_stats namelist to turn on extra timer output information - Add ICE_BFBTYPE and update bit-for-bit comparison logic in scripts - Update qc and logbfb testing - Remove logbfb and qchkf tests, add cmplog, cmplogrest, cmprest set_env files to set ICE_BFBTYPE - Update documentation * Update EVP OMP implementation * - Refactor puny/pi scalars in eap dynamics to improve performance - Update OMP in evp and eap * Clean up * Comment out temporary timers * Update OMP env variables on Narwhal * Update gaffney OMP_STACKSIZE * update OMP_STACKSIZE on cori * Update Onyx OMP_STACKSIZE Update documentation * Update OMP_STACKSIZE on mustang * - Update Tsfc values on land in various places in the code, was affecting testing. Specifically fix upwind advection. - Comment out OMP in ice_dyn_evp_1d_kernel, was producing non bit-for-bit results with different thread counts * updating LICENSE.pdf for 2022 * seabed stress - remove if statements (CICE-Consortium#673) * refactor seabed_stress. Bit for bit * Removed if statement from stepu. Results are binary identical, however taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration * changed capping from logical to numeric in order to remove if statement. Moved call to deformation out of loop * clean dyn_finish, correct intent(inout) to intent(in) for Cw, resse Cb in stepu, remove if from seabed_stress_LKD * Reolve conflicts after updating main * modified environment for Freya to accomodate for additional OMP commands * Requested changes after review. Only changed in seabed stress and not bit for bit if cor=0.0 added legacy comment in ice_dyn_finish * move deformation to subcycling * - Update version and copyright. (CICE-Consortium#691) - Remove gordon and conrad machines. - Add setenv OMP_STACKSIZE commented out in env files - Update Icepack to fc4b809 * add OMP_STACKSIZE for koehr (CICE-Consortium#693) * Update C/CD deformations calls to be consistent with main B changes Update tauxbx, tauxby calculations on C/CD to be consistent with main B changes * Update OpenMP in C/CD implementation Extend omp_suite to include C/CD tests * reconcile recent merge problem * set default value of capping to 0. in vp cases for backwards compatibility * Set capping to 1.0 in vp consistent with evp, changes answers for vp configurations Co-authored-by: David A. Bailey <dbailey@ucar.edu> Co-authored-by: Philippe Blain <philippe.blain@canada.ca> Co-authored-by: Denise Worthen <denise.worthen@noaa.gov> Co-authored-by: daveh150 <david.hebert@nrlssc.navy.mil> Co-authored-by: David A. Hebert <dhebert@nrlssc.navy.mil> Co-authored-by: Mariana Vertenstein <mvertens@ucar.edu> Co-authored-by: Elizabeth Hunke <eclare@lanl.gov> Co-authored-by: TRasmussen <33480590+TillRasmussen@users.noreply.github.com>

TillRasmussen added 2 commits December 7, 2021 23:49

refactor seabed_stress. Bit for bit

5c036cc

Removed if statement from stepu. Results are binary identical, howeve…

4ab3476

…r taubx and tauby is updated on all iterations instead of just the last one. Not used within iteration

eclare108213 approved these changes Dec 12, 2021

View reviewed changes

apcraig requested review from JFLemieux73 and phil-blain December 13, 2021 17:07

changed capping from logical to numeric in order to remove if stateme…

0122dd3

…nt. Moved call to deformation out of loop

clean dyn_finish, correct intent(inout) to intent(in) for Cw, resse C…

34b074c

…b in stepu, remove if from seabed_stress_LKD

phil-blain reviewed Feb 18, 2022

View reviewed changes

phil-blain suggested changes Feb 18, 2022

View reviewed changes

Merge branch 'main' into neweap

dbbad0d

TillRasmussen added 2 commits February 21, 2022 18:03

Reolve conflicts after updating main

22893ca

modified environment for Freya to accomodate for additional OMP commands

c833f90

TillRasmussen force-pushed the neweap branch from 3b48d48 to c833f90 Compare February 21, 2022 18:05

apcraig reviewed Feb 21, 2022

View reviewed changes

eclare108213 reviewed Feb 21, 2022

View reviewed changes

phil-blain suggested changes Feb 22, 2022

View reviewed changes

Requested changes after review. Only changed in seabed stress and not…

7376807

… bit for bit if cor=0.0 added legacy comment in ice_dyn_finish

TillRasmussen force-pushed the neweap branch from 1743719 to 7376807 Compare February 22, 2022 16:02

move deformation to subcycling

e312c5b

apcraig merged commit a59fa47 into CICE-Consortium:main Feb 23, 2022

TillRasmussen deleted the neweap branch February 23, 2022 21:50

DeniseWorthen mentioned this pull request May 3, 2022

Update CICE to latest Consortium/main NOAA-EMC/CICE#47

Closed

		real (kind=dbl_kind), dimension (nx_block,ny_block), intent(in) :: &
		Cw ! ocean-ice neutral drag coefficient

		! if (hwu < threshold_hw) then
		docalc_tbu = max(sign(c1,threshold_hw-hwu),c0)

		@@ -1401,42 +1399,33 @@ subroutine viscous_coeffs_and_rep_pressure (strength, tinyarea, &

		! NOTE: for comp. efficiency 2 x zeta and 2 x eta are used in the code

		! if (trim(yield_curve) == 'ellipse') then

seabed stress - remove if statements #673

seabed stress - remove if statements #673

Conversation

TillRasmussen commented Dec 12, 2021 • edited Loading

PR checklist

eclare108213 left a comment

Choose a reason for hiding this comment

phil-blain commented Dec 13, 2021

TillRasmussen commented Dec 13, 2021

apcraig commented Dec 13, 2021

eclare108213 commented Dec 13, 2021

TillRasmussen commented Dec 14, 2021 • edited Loading

TillRasmussen commented Dec 14, 2021

TillRasmussen commented Dec 20, 2021 • edited Loading

TillRasmussen commented Dec 20, 2021

eclare108213 commented Feb 9, 2022

phil-blain commented Feb 10, 2022

TillRasmussen commented Feb 11, 2022

Choose a reason for hiding this comment

phil-blain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TillRasmussen Feb 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phil-blain commented Feb 18, 2022 • edited Loading

TillRasmussen commented Feb 19, 2022

apcraig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TillRasmussen commented Feb 21, 2022 • edited Loading

TillRasmussen commented Feb 21, 2022 • edited Loading

TillRasmussen commented Feb 21, 2022

Choose a reason for hiding this comment

TillRasmussen Feb 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TillRasmussen commented Feb 21, 2022

apcraig commented Feb 22, 2022

apcraig commented Feb 22, 2022

TillRasmussen commented Feb 22, 2022

eclare108213 commented Feb 22, 2022

phil-blain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eclare108213 commented Feb 22, 2022

TillRasmussen commented Feb 22, 2022

eclare108213 commented Feb 22, 2022

TillRasmussen commented Feb 22, 2022

eclare108213 commented Feb 22, 2022

eclare108213 commented Feb 22, 2022

TillRasmussen commented Feb 22, 2022

apcraig commented Feb 22, 2022

TillRasmussen commented Feb 22, 2022

apcraig commented Feb 23, 2022

TillRasmussen commented Dec 12, 2021 •

edited

Loading

TillRasmussen commented Dec 14, 2021 •

edited

Loading

TillRasmussen commented Dec 20, 2021 •

edited

Loading

TillRasmussen Feb 18, 2022 •

edited

Loading

phil-blain commented Feb 18, 2022 •

edited

Loading

TillRasmussen commented Feb 21, 2022 •

edited

Loading

TillRasmussen commented Feb 21, 2022 •

edited

Loading

TillRasmussen Feb 21, 2022 •

edited

Loading