dynamics: add implicit VP solver (#491)

* initial coding of ice_dyn_vp.F90 * code (matvec, bvec and residual) now compiles after debugging * introduced calc_vrel_Cb and calc_visc_coeff subroutines and precond.F90 file * after debugging...now compiles * in process of adding precondD * precondD is in progress * I finished coding precondD...it compiles * added calculation for L2norm * modif to ice_step_mod.F90 for call to imp_solver * now form complete vectors by alternating u(i,j) and v(i,j) and so on * added simple routine to form a vector * corrected bug for vectors of size ntot * added vectors of size ntot to bvec and matvec * step back...not so easy to form all the vectors for the blocks... * new subroutine for assembling ntot vectors * In the process of adding the call to fgmres * new subroutine to convert vector to the max_blocks arrays * ice_dyn_vp compiles...I now need to make fgmres.F90 compile... * added BLAS routines needed by fgmres * added the fgmres.F90 routine...does not compile yet... * minor modif to fgmres.F90 * bug in call to arrays_to_vec instead of vec_to_arrays * Remove fgmres.F90 (MPI version, will re-add GEM version later) This file 'include's mpif.h and so breaks serial compilation. I will copy the FGMRES solver from GEM later on and modify it to call the CICE communication routines. * added serial fgmres for testing * modified code for serial tests * after debugging... * added code to put sol vector in uvel and vvel arrays at the end of fgmres * modified tinyarea calc for VP (punyVP=2e-9) * modified location of allocation for some variables... * small bug corrected in allocating variables * introduced capping of the viscous coeff following Kreysher * added a relaxation to uvel,vvel to try to get nonlinear conv * modif to relaxation calc...minor * I found a bug in my 1st implementation. The dsig/dx term related to the rep pressurer should be on the RHS...in the process of correcting this * Pr is now calc with zeta * dPrdx is now in calc_bvec * removed Pr part from stress_vp calc * minor modifs to improve print of res norm in fgmres * added linear conv criterion based on a gamma * added NL conv criterion * in the process of adding precond diag * now forms the vector diagvec * precond diag is completed and compiles * diagvec was not allocated...now ok * diagvec was not deallocated...now ok * fixed bug in formDiag_step2 * small modif * small modif...again * modified formation of diagonal for precond * just removed a comment * minor modif to formDiag_step1 * added an option for precond (free drift diag or complete diag * minor modif * minor modif * removed OLDformDiag_step1 * Ok I think I know what was wrong with the precond diag...I have to use icellu and not icellt * precond 3 works but not as good as precond 2 (free drift)...removed OLD code * ok found the problem in step2...now fixed * added routine to calc deformations for mech redistribution...for the evp this is done for k=ndte in stress...here it is done once we have NL convergence * chnaged names of evp_prep1 to dyn_prep1... * in process of adding gmres as precond to fgmres... * matvec is now done in one routine instead of in two steps before * in the process of adding gmres precond * finalizing implementation of gmres for precond * pgmres seems to work...I used it as a solver and it works fine * minor modif: introduction of maxits_pgmres * modif to comments...very minor * something was inconsistent for precond choice...now fixed and removed one precond option (free drift) * removed krelax...not needed as a more complex method will be added * in the process of adding the RRE1 acceleration method * fixed problem in halo update (spherical grid...wraps on itself) before matvec * cicecore: add puny_dyn variable to initialize tinyarea, remove temporary punyVP variable * ice_dyn_vp: move solver algorithm and output parameters to namelist Also, since 'yield_curve' and 'e_ratio' are used for both VP and EVP dynamics, print their values for both solvers. * {f,}gmres: use nu_diag for output * dynamics: correct 2 bugs found using debugging flags - eps1 was printed before being initialized in pgmres - a loop on iblk was missing around the call to deformations in ice_dyn_vp * ice_dyn_vp: rename VP namelist variables to be more descriptive Also, add namelist flag to monitor nonlinear convergence and document VP namelist variables in runtime log. * pgmres: streamlined monitoring output Bring the PGMRES output more in line with the FGMRES and nonlinear solvers. * ice_dyn_vp: add computation of nonlinear and fixed point residuals * ice_dyn_vp: remove references to RRE acceleration method We will not use this acceleration method, so remove remaining references to it. * ice_dyn_vp: remove unused 'ulin', 'vlin' variables * ice_dyn_vp: move Picard iteration to separate subroutine Make the code more modular by moving the Picard solver to its own subroutine. Also, rename 'kmax' to 'maxits_nonlin' to bring it in line with 'maxits_{f,p}gmres'. * ice_dyn_vp: add Anderson acceleration for Picard iteration * Remove nonlinear convergence logic from fgmresD.F90 (move to picard_solver in ice_dyn_vp.F90) * Add ouput of residual norm to fgmresD.F90 * Remove unused variables (fgmresD.F90, pgmres.F90, ice_dyn_vp.F90) * Fix warning for format 1008 (ice_init.F90) * Move FGMRES solver to separate subroutine (ice_dyn_vp.F90) * Remove unused use statements in picard_solver (ice_dyyn_vp.F90) * options: add options files for implicit solver - diag_imp sets all monitoring options (nonlin, {f,p}gmres) to on - dynanderson sets the implicit solver to on and the nonlinear algorithm to Anderson acceleration - dynpicard sets the implicit solver to 'on' and the nonlinear algorithm to Picard iteration - run3dt runs the model for three time steps * ice_dyn_vp: correct residual computations if max_blocks > 1 The way we compute the nonlinear residual norm does not take into account that `L2norm` is an array of size `max_blocks`. Fix that by summing the components, squared, and taking the square root (we need to square the components because the subroutine 'calc_L2norm' returns the square root). * ice_dyn_vp: differentiate fixed point and progress residuals For Anderson acceleration, the fixed point residual (g(x) - x) is not the same as the 'progress residual' (u-u_old). Make this distinction clear in the code and in the monitoring output. * ice_dyn_vp: fix convergence issues in Picard and Anderson solvers Both solvers currently converge, but the solution (uvel,vvel) is not qualitatively the same as the EVP solution; velocities are too small. For picard_solver, it is due to the fact that the initial residual stored and used to check for convergence is not the initial residual at the start of the nonlinear iteration, but the last residual at the end of the first FGMRES iteration. Since this residual is smaller, convergence is "faster" but the solution is not the same; it is not converged enough in the sense of the nonlinear residual. This is an implementation error that was introduced when the nonlinear convergence logic was removed from the fgmres solver. For anderson_solver, it is due to using the fixed point residual norm to check for convergence. Similarly, this residual is smaller than the nonlinear residual and thus leads to a "faster" convergence, but to a different solution (not converged enough in the sense of the nonlinear residual). Fix both errors. * ice_dyn_vp: add (ulin,vlin) to compute vrel based on 2 previous iterates Initial test shows that using the mean of the 2 previous iterates to compute vrel accelerates convergence for Picard iteration (this is robust). For Anderson it is sometimes better, sometimes worse, so by default it is set to false in the namelist. This also removes the oscillation produced by Picard iteration, which is in line with results in doi:10.1029/2008JC005017 * machines: cesium: add LAPACK and BLAS library to Macros * ice_dyn_vp: add 'subname' variable to each subroutine * ice_dyn_vp: cleanup 'icepack_warnings_*' calls Some calls to Icepack warning interfaces are not needed; remove them. Another call is misplaced; move it to the right place. * ice_dyn_vp, pgmres: remove unused arguments, variables and 'use' statements variables * ice_dyn_vp: refactor 'puny' treatment for implicit solver The way 'puny_dyn' is set in the 'cice' driver, and in 'ice_constants' and 'ice_grid' is kind of hacky. Revert those changes. Instead, add an 'init_vp' subroutine that is called from the drivers and that recomputes 'tinyarea' with the new 'puny_vp' value. Call 'init_evp' from 'init_vp', mimicking what is done ine 'init_eap'. * ice_dyn_vp: make 'calc_bvec' use already computed 'vrel' Pass the already computed 'vrel' as an argument instead of recomputing it. * ice_dyn_vp: remove references to fgmres2 subroutine We still reference 'fgmres2', an MPI implementation of FGMRES that was never completely integrated with the implicit solver. Remove references to it as well as its arguments. * dynamics: remove 'BLAS_routines.F90' It is a best practice to use the BLAS library natively available on a system, or install a high performance implementation like ATLAS, OpenBLAS, etc. Remove the file 'BLAS_routines.F90', which contains BLAS routines copied from the reference NETLIB implementation. * ice_dyn_vp: synchronize 'imp_solver' with ice_dyn_evp::evp The 'imp_solver' subroutine is the main driver of the implicit VP solver and was copied from a very old version of the 'evp' EVP driver in ice_dyn_evp.F90. Bring 'imp_solver' in line with changes in 'evp', and move some statements around to keep related statements together. * dynamics: refactor strain rates and deformation calculations Create subroutines for strain rates and deformations computations, add them to ice_dyn_shared and call them in ice_dyn_evp and ice_dyn_vp. This reduces code duplication. * dynamics: add sol_fgmres2d from GEM as ice_krylov.F90 (does not compile) Copy (verbatim) the 2D FGMRES solver from the GEM atmospheric model as 'ice_krylov.F90'. * ice_krylov: adapt FGMRES algo up to inner loop (WIP) * ice_krylov: adapt FGMRES algo up to preconditioner call (WIP) * ice_krylov: adapt FGMRES algo up to before CGS (WIP) * ice_krylov: adapt FGMRES algo up to end of Arnoldi (except 1st GS loop) (WIP) * ice_krylov: adapt FGMRES algo up to updating of solution (WIP) * ice_krylov: adapt FGMRES algo up to residual calculation (WIP) * ice_krylov: correct CICE kinds in function almost_zero * comm: add 'global_sums' module procedure Add a 'global_sums' module procedure that computes the global sum of several scalars, held in a 1D array ('vector'). Make use of the existing 'compute_sums_dbl' subroutine. * ice_krylov: adapt FGMRES including Gram-Schmidt (WIP) * ice_krylov: add 'pgmres' subroutine (mostly the same as 'fgmres') The PGMRES algorithm is the same as using FGMRES with the same preconditioner at each iteration. However, in order to reuse the FGMRES code the 'fgmres' subroutine would have to be a recursive subroutine, which might have performance implications. For now, add it a separate subroutine. * ice_krylov: move subroutines to ice_dyn_vp to work around circular deps 'ice_krylov' and 'ice_dyn_vp' modules each 'use' variables from each other, which makes them impossible to compile in a clean build. Transfer the subroutines in 'ice_krylov' to 'ice_dyn_vp'. While at it, update the public interface of module ice_dyn_vp to only expose what is used in other modules. Also, make 'icellu', 'icellt' and 'indxtij' et al. module variables, to reduce subroutine argument counts. In the same spirit, add use statements. * ice_dyn_vp: add 'subname' to subroutine 'almost_zero' and correct indentation * ice_dyn_vp: add 'ice_HaloUpdate_vel' subroutine Introduce the subroutine 'ice_HaloUpdate_vel', which can be used to do halo updates for vector field, such as the ice velocity (uvel,vvel). * ice_dyn_vp: make 'fld2' a module variable Reduce subroutine argument counts by making 'fld2' a module variable, and allocate it in 'init_vp'. * ice_dyn_vp: move 'ice_halo' to module 'use' statement * ice_dyn_vp: add workspace initialization to 0 and haloupdate before matvec in fgmres (WIP) * ice_dyn_vp: add choice between classical and modified Gram-Schmidt orthogonalization The FGMRES solver from GEM only has classical Gram-Schmidt (CGS), but modified Gram-Schmidt (MGS) is more robust. Preliminary tests show that the convergence of the non-linear solver is seriously degraded when CGS is used instead of MGS inside FGMRES. This was tested by changing MGS to CGS in the Saad FMGRES implementation : diff --git a/cicecore/cicedynB/dynamics/fgmresD.F90 b/cicecore/cicedynB/dynamics/fgmresD.F90 index 5a93e68..c7d49fa 100644 --- a/cicecore/cicedynB/dynamics/fgmresD.F90 +++ b/cicecore/cicedynB/dynamics/fgmresD.F90 @@ -192,12 +192,13 @@ subroutine fgmres (n,im,rhs,sol,i,vv,w,wk1, wk2, & ! if (icode .eq. 3) goto 11 call dcopy (n, wk2, 1, vv(1,i1), 1) !jfl modification ! -! modified gram - schmidt... +! classical gram - schmidt... ! do j=1, i - t = ddot(n, vv(1,j), 1, vv(1,i1), 1) !jfl modification - hh(j,i) = t - call daxpy(n, -t, vv(1,j), 1, vv(1,i1), 1) !jfl modification + hh(j,i) = ddot(n, vv(1,j), 1, vv(1,i1), 1) !jfl modification + enddo + do j=1, i + call daxpy(n, -hh(j,i), vv(1,j), 1, vv(1,i1), 1) !jfl modification enddo t = sqrt(ddot(n, vv(1,i1), 1, vv(1,i1), 1)) !jfl modification hh(i1,i) = t Add an 'orthogonalize' subroutine that can do either CGS or MGS to abstract away the orthogonalization procedure. Add a namelist parameter 'ortho_type' (defaulting to 'mgs') so that the user can change at run time if the FGMRES solver should use CGS or MGS as an orthogonalization method. Note: at the moment the 'pgmres' subroutine is still hard-coded to use CGS. * ice_dyn_vp: change order of arguments for 'precondition' subroutine The arguments with 'intent(out)' and 'intent(inout)' are usually listed last in the argument list of subroutines and functions. This is in line with [1]. Change 'precondition' to be in line with this standard. [1] European Standards for Writing and Documenting Exchangeable Fortran 90 Code, https://wiki.c2sm.ethz.ch/pub/COSMO/CosmoRules/europ_sd.pdf * ice_dyn_vp: fgmres: correctly initialize workspace_[xy] and arnoldi_basis_[xy] A previous commit only initialied them in pgmres. This should be squashed with 1c1b5cf (Add workspace initialization to 0 and haloupdate before matvec in fgmres (WIP), 2019-07-24) * ice_dyn_vp: matvec: change Au,Av from 'intent(out)' to 'intent(inout)' According to The Fortran 2003 Handbook [1], the Fortran standard states that arguments with 'intent(out)' are left uninitialized if the subroutine/function does not set them. This is dangerous here since the 'Au', 'Av' arguments are set in 'matvec' only for grid cells where there is ice, so any grid cells with no ice that were previously initialized (eg. to zero) in the actual arguments could be left with garbage values after the call to 'matvec' (this does not happen with the Intel compiler but it's still better to follow the standard). [1] J. C. Adams, W. S. Brainerd, R. A. Hendrickson, R. E. Maine, J. T. Martin, and B. T. Smith, The Fortran 2003 Handbook. London: Springer London, 2009. * ice_dyn_vp: clarify decriptions of FGMRES variables User 'Arnoldi' and 'restarts' in addition to 'inner' and 'outer'. * ice_dyn_vp: add subroutines description and clarify existing ones * ice_dyn_vp: fix comments in FGMRES algorithm The next block does not only compute the new Givens rotation, it applies it, so tweak the comment to be in line with the code. * ice_dyn_vp: fgmres: fix misleading check for return Using `outiter > maxouter` means that the algorithm would always perform at most `maxouter+1` outer (restarts) iterations, since at the end of the `maxouter`-th iteration we would have `maxouter > maxouter`, which is false, so the algorithm would do a last outer iteration before returning. Fix this by using ">=" so that the `maxouter` value is really the maximum number of outer iterations permitted. This error was in the GEM version. * ice_dyn_vp: fgmres: fix error in the last loop computing the residual workspace_[xy] was rewritten at each iteration instead of being incremented. Fix that. * ice_dyn_vp: change the definition of F(x) from 'A.x - b' to 'b - A.x' This makes the result of the 'residual_vec' subroutine conform with the definition of the residual in the FGMRES algorithm (r := b - A.x). This should have no other impact than making the new fgmres subroutine actually work. * ice_dyn_vp: fgmres: comment out early return (WIP) This is different than the legacy FGMRES and thus needs to be commented to compare both versions of the algorithm. * ice_dyn_vp: pgmres: synchronize with fgmres (WIP) Also remove unnecessary return statement at the end of fgmres. * ice_dyn_vp: picard_solver: re-enable legacy solver Rename the subroutines fgmres and pgmres in standalone files with a '_legacy' suffix. * cicecore: remove remaining 'SVN:$Id:' lines * ice_dyn_vp: remove 'picard_solver' subroutine The 'picard_solver' subroutine has been abandoned for quite a while and the Picard algorithm can be used by using the Anderson solver with `im_andacc = 0`, i.e. not saving any residuals. Remove the subroutine and move the logic around the namelist variable 'algo_nonlin' from 'imp_solver' to 'anderson_solver'. This way, the new developments made in 'anderson_solver' (using the FGMRES solver from GEM, choosing the orthogonalization method, the addition of the precondition subroutine) can also be used with `algo_nonlin = 1`. * ice_dyn_vp: remove legacy FGMRES solver The new FGMRES solver adapted from GEM was already verified to give the same results as the legacy implementation. Remove the legacy implementation. * ice_dyn_vp: pgmres: use Classical Gram-Schmidt The legay PGMRES preconditioner uses Classical Gram-Schmidt for orthogonalization, so in order to compare the new solver with the old one we must use CGS in PGMRES. A following commit will add a namelist variable to control this behavior. * ice_in: change maxits_{f,p}gmres to be in line with new meaning Change the reference namelist to conform with the new meaning of `maxits_{f,p}gmres`, which is different between the old and the new {F,P}GMRES solvers: - in the old solvers 'maxits_*' is the total number of inner (Arnoldi) iterations - in the new solvers 'maxits_*' is given as the 'maxouter' argument, i.e. the total number of outer (restarts) iterations and so 'maxits' from the old solvers correspond to 'maxouter*maxinner' This means that in the namelist 'maxits_*' must be set to 1 if it was previously set to the same thing as 'im_*' (i.e., do not do any restarts). * ice_dyn_vp: uniformize naming of namelist parameters for solver tolerances 'gammaNL', 'gamma' and 'epsprecond' refer respectively to the relative tolerances of the nonlinear solver, linear solver (FGMRES) and preconditioner (PGMRES). For consistency and to use meaningful names in the code, rename them to 'reltol_nonlin', 'reltol_fgmres' and 'reltol_pgmres'. * ice_dyn_vp: anderson_solver: adapt residual norm computations for MPI The norm of the different residuals need to be computed using the squared value of all components across MPI processes. We keep the 'res' and 'fpfunc' vectors as 1D vectors for now, to minimize code changes. This will be revised when the Anderson solver is parallelized. * ice_dyn_vp: write solver diagnostics only for master task Note that care must be taken not to call 'global_sum' inside an `if (my_task == master_task)` construct, as this will create an incorrect communication pattern and ultimately an MPI failure. * ice_dyn_vp: remove references to EVP Reomve leftovers references from the initial copy of ice_dyn_evp.F90 * ice_dyn_vp: rename 'puny_vp' to 'min_strain_rate' This value really is a minimum strain rate, so name it as such. * ice_dyn_vp: remove "signature" comments from JF These comments do not add value to the code, remove them. * ice_dyn_vp: write stresses at end of time step In the EVP solver, stress are subcycled using the global 'stress{p,m,12}_[1-4]' arrays, so these arrays contain the subcycled stresses at the end of the EVP iterations. These global arrays are used to write the stresses to the history tape if the stress variables are enabled in the namelist. In the VP solver, stresses are not explicitly computed to solve the momentum equation, so they need to be computed separately at the end of the time step so that they can be properly written to the history tape. Use the existing, but unused, 'stress_vp' subroutine to compute the stresses, and call it at the end of the 'imp_solver' driver. Remove uneeded computations from 'stress_vp'. Move the definition of the 'zetaD' array to 'imp_solver' so it can be passed to 'stress_vp'. * ice_dyn_vp: cleanup 'intent's - move 'intent(out)' and 'intent(inout)' arguments to the end of the argument list (ice_HaloUpdate_vel is an exception, for consistency with other ice_HaloUpdate subroutines). - move 'intent(in)' before any 'intent(out)' or 'intent(inout)' - change 'intent(inout)' to 'intent(out)' if the argument is rewritten inside the subroutine * dynamics: correct the definition of 'shearing strain rate' in comments All occurences of 'shearing strain rate' as a comment before the computation of the quantities `shear{n,s}{e,w}` define these quantities as shearing strain rate = e_12 However, the correct definition of these quantities is 2*e_12. Bring the code in line with the definition of the shearing strain rate in the documentation (see the 'Internal stress' section in doc/source/science_guide/sg_dynamics.rst), which defines D_S = 2\dot{\epsilon}_{12} Correct these. * ice_dyn_vp: correctly use 'subname' in 'abort_ice' calls * ice_dyn_vp: add 'calc_seabed_stress' subroutine Add a subroutine to compute the diagnostic seabed stress ('taubx' and 'tauby'), and call it from the 'imp_solver' driver. Note that in EVP, this is done in the 'stepu' subroutine in ice_dyn_shared. * dynamics: move basal stress residual velocity ('u0') to ice_dyn_shared Make 'u0' a module variable in ice_dyn_shared, and use for both EVP and VP without duplicating the hard-coded value in the code. * ice_dyn_vp: use 'deformations' from ice_dyn_shared The 'deformations' subroutine is used in ice_dyn_evp, but not in ice_dyn_vp. * ice_dyn_vp: simplify 'calc_vrel_Cb' subroutine Remove the uneeded variables 'utp', 'vtp' and directly use 'uvel' and 'vvel' instead. * ice_dyn_vp: rename 'Diag[uv]' to 'diag[xy]' for consistency The rest of the code uses 'x' ('y') as a suffix for the x- and y- components of "vector field"-type arrays. Bring 'Diag[uv]' in line with this convention, and loose the capitalization. * ice_dyn_vp: introduce 'CICE_USE_LAPACK' preprocessor macro The 'anderson_solver' subroutine uses calls from the LAPACK library, but these are strictly needed only for the Anderson solver, and not for the Picard solver. Introduce a preprocessor macro, 'CICE_USE_LAPACK', that can be used to "preprocess out" any code that uses LAPACK calls. Refactor the code so that the Picard solver works without LAPACK (this is easy since we only have one call to 'dnrm2' to remove, and an alternative implementation already exist, but is commented). Uncoment it and add an 'ifdef' so that this code is used if 'CICE_USE_LAPACK' is not defined. Also, make the Anderson solver abort if CICE was compiled without LAPACK. These two changes make the model compile if LAPACK is not available. Add an option file 'set_env.lapack' to automatically configure the model compilation upon 'cice.setup' to use LAPACK (note that Macros and env files for the chosen machine need to be modified to add LAPACK support; at the moment only 'cesium' is supported). * ice_dyn_vp: convert 'algo_nonlin' to string Make the reference namelist easier to use by changing the type of the 'algo_nonlin' variable to a string. This way, we can use 'picard' and 'anderson' instead of '1' and '2'. * ice_dyn_vp: convert 'precond' to string Make the namelist easier to use by converting the 'precond' namelist variable to a string. This way we can use 'ident', 'diag' or 'pgmres' instead of '1', '2' or '3'. * ice_dyn_vp: convert 'monitor_{f,p}gmres' to logical Convert the namelist variables 'monitor_{f,p}gmres', which are used as logical values, to actual logical values. * ice_dyn_vp: remove unimplemented 'fpfunc_andacc' from the namelist In the future, we might add a second fixed point function, g_2(x), instead of the regular Picard fixed point function currently implemented, g_1(x) = FGMRES(A(x), x0, b(x)) - x Remove the namelist flag 'fpfunc_andacc' from the reference namelist, and make the model abort if this flag is used, since only g_1(x) is currently implemented. * ice_dyn_vp: add input validation for string namelist variables Add input validation in ice_init for 'algo_nonlin', 'precond' and 'ortho_type'. Currently, the code that resets 'im_andacc' to zero if the Picard solver is chosen (algo_nonlin = 1) is inside the 'anderson_solver' subroutine, which is not clean because 'im_andacc' is used as a dimension for several local variables. These variables would thus have whatever size 'im_andacc' is set to in the namelist, even if the Picard solver is chosen. Fix that by moving the code that sets 'im_andacc' to zero if the Picard solver is chosen (algo_nonlin = 1) to ice_init. * ice_dyn_vp: abort if Anderson solver is used in parallel Since the Anderson solver is not yet parallelized, abort the model if users attempt to run it on more than one proc. * ice_dyn_vp: reimplement monitor_{f,p}gmres The ability to monitor the residual norm of the FGMRES solver and the PGMRES preconditioner was lost when we transitioned to the new solver implementation. Re-add this capability. * ice_dyn_vp: use 'ice_HaloUpdate_vel' everywhere By performing the halo update after creating the halo_info_mask in imp_solver, we can use the ice_HaloUpdate_vel subroutine, increasing code reuse. * ice_dyn_vp: move ice_HaloUpdate_vel subroutine to ice_dyn_shared Move the subroutine to ice_dyn_shared so it can be used in ice_dyn_evp and ice_dyn_eap. * ice_dyn_evp: use ice_HaloUpdate_vel Now that the subroutine is in ice_dyn_shared, use it to increase core reuse. * ice_dyn_eap: use ice_HaloUpdate_vel Now that the subroutine is in ice_dyn_shared, use it to increase code reuse. * options: set 'use_mean_vrel' in 'dynpicard' Preliminary testing suggests convergence is improved when using the mean of the two previous estimates to compute 'vrel' when the Picard solver is used. Set this as the default with option 'dynpicard'. * conda: add 'libapack' to environment spec and Macros Add the necessary conda package to build CICE with LAPACK, so that the Anderson solver can be tested in the conda environment. * ice_dyn_vp: initialize 'L2norm' and 'norm_squared' to zero If we don't initialize thess arrays, using any MPI decomposition for which nblocks /= max_blocks on some process will cause uninitialized values to be summed when computing the local sum for each block, like in nlres_norm = sqrt(global_sum(sum(L2norm), distrb_info)) Initialize them to zero. * ice_dyn_vp: initialize 'stPr' and 'zetaD' to zero Since 'stPr' and 'zetaD' are initialized with a loop over 'icellt', but are used on loops over 'icellu', it is possible that these latter loops use uninitialized values since 'iceumask' and 'icetmask' can differ at some grid points since they are not defined using the same criteria (see ice_dyn_shared::dyn_prep1 and ice_dyn_shared::dyn_prep2). To avoid using unitialized values, initialize the 'stPr' and 'zetaD' arrays to zero, so that any index corresponding to a cell with no ice is set to zero. This mimics what is done for EVP, where the 'str' array is initialized to zero in ice_dyn_evp::stress. * doc: document implicit solver * ice_dyn_vp: default 'use_mean_vrel' to 'true' Previous testing reveals it's better to set 'use_mean_vrel' to true (i.e., use the mean of the 2 previous nonlinear iterates to compute `vrel`) for the Picard solver, but the picture is less clear for the Anderson solver. Since we are only advertising the Picard solver for now, make this flag default to 'true'. * ice_in: remove Anderson solver parameters Since we are only advertising the Picard solver for now, remove the namelist parameters for the Anderson solver from the reference namelist. * ice_dyn_vp: add article references * ice_dyn_vp: remove unused subroutines Remove subroutines 'stress_prime_vpOLD', 'matvecOLD' and 'precond_diag'. 'stress_prime_vpOLD' and 'matvecOLD' have been unused since the creation of the 'matvec' subroutine. 'precond_diag' has been unused since we deactivated the legacy FGMRES solver. * ice_dyn_vp: remove unused local variables * ice_dyn_vp: calc_bvec: remove unused arguments This completes 56d55c7 (ice_dyn_vp: make 'calc_bvec' use already computed 'vrel', 2019-06-10) * ice_dyn_vp: anderson_solver: remove unused arguments This completes 4616628 (ice_dyn_vp: remove legacy FGMRES solver, 2020-02-17). * ice_dyn_vp: remove unused 'use'-imported variables in subroutines * ice_dyn_vp: fix trailing whitespace errors * ice_dyn_vp: uniformize indentation and whitespace * ice_dyn_vp: put 'intent's on same lines as types * ice_dyn_vp: use '==' instead of '.eq.' * ice_dyn_vp: uniformize whitespace in subroutine arguments * ice_dyn_vp: calc_bvec: remove unneeded arguments * ice_dyn_vp: clean up and clarify code comments * ice_dyn_vp: matvec: remove unneeded local variable * ice_dyn_vp: rename 'stPrtmp' to 'stress_Pr' and 'Dstrtmp' to 'diag_rheo' Also rename 'Dstr' to 'Drheo' in formDiag_step[12] * ice_dyn_vp: rename 'calc_zeta_Pr' to 'calc_zeta_dPr' The subroutine 'calc_zeta_Pr' computes the derivatives of the replacement pressure, so rename it to 'calc_zeta_dPr' to bring its name in line with its computation. * ice_dyn_vp: rename 'ww[xy]' to 'orig_basis_[xy]' The arrays 'ww_[xy]' are used to keep the values of the preconditioned vectors, in order to update the solution at the end of the iteration. As such, rename them to 'orig_basis_[xy]', since they form a basis of the solution space and this name describes their purpose better. * ice_dyn_vp: [fp]gmres: removed unused 'r0' variable and 'conv' argument Both of these are either leftovers of the conversion from the legacy FGMRES implementation or of the GEM implementation, but are unused in our implementation. Remove them Move the declaration of the BLAS functions in their own type declaration to avoid having to deal with the trailing `, &`. * ice_dyn_vp: make 'maxits_nonlin' default to 4 Let's use a small number of nonlinear iterations by default, since other models with VP solvers are often used with a small number of iterations. Add an option file 'set_nml.nonlin5000' to set 'maxits_nonlin' to 5000, effectively letting the VP solver iterate until 'reltol_nonlin' is reached. * ice_dyn_vp: rename 'im_{fgmres,pgmres,andacc}' to 'dim_*' Since these three parameters denote dimensions (of the FGMRES, PGMRES and Anderson acceleration solution spaces), name them as such. * tests: add 'dynpicard' test to base_suite Add a test using the VP implicit solver to the base_suite. Let's use the gx3 grid because it is less computationally intensive, and use a 4x1 decomposition to test the solver in parallel. * dynamics: remove proprietary vector directives * ice_init: validate implicit solver arguments only if solver is active 'algo_nonlin', 'precond' and 'ortho_type' are only active if 'kdyn == 3', so wrap the code validating the values of these flags in an if block. * ice_dyn_vp: rename 'imp_solver' to 'implicit_solver' Use a more explicit subroutine name, which additionnally does not have any negative connotations. * doc: add caveat for VP solver (tx1 grid, threading) * ice_dyn_vp: rename 'CICE_USE_LAPACK' macro to 'USE_LAPACK' Bring the name of the new macro 'CICE_USE_LAPACK' more in line with the changes to the CPP macros in 819eedd (Update CPP implementation (#490), 2020-07-31) by renaming it to 'USE_LAPACK'. * comm: rename 'global_sums' to 'global_allreduce_sum' Make the purpose of the 'global_sums' interface, i.e. reducing a distributed variable to a variable of the same shape (MPI_ALLREDUCE) clearer by renaming it to 'global_allreduce_sum'. This makes it clearer that in contrast to the 'global_sum' interface, the resulting variable is of the same shape as the distributed variable. * dynamics: remove 'ice_HaloUpdate_vel' and introduce '(un)stack_velocity_field' The 'ice_HaloUpdate_vel' subroutine feels out of place in 'ice_dyn_shared', and moving it to 'ice_boundary' introduces other problems, including circular dependencies. Remove it, and introduce two simple subroutines, 'stack_velocity_field' and 'unstack_velocity_field', that are responsible to load the 'uvel' and 'vvel' arrays into the 'fld2' array used for the halo updates. Use these new subroutines in ice_dyn_{evp,eap,vp}. * dynamics: rename 'init_evp' to 'init_dyn' The 'init_evp' subroutine initializes arrays that are used in the EVP, EAP or VP dynamics. Rename it to 'init_dyn' to emphasize that it is not specific to EVP. * drivers: call 'init_dyn' unconditionnally 'init_dyn' is called in both 'init_eap' and 'init_vp', so it makes more sense for this subroutine to be called unconditionnally in the drivers, and then call 'init_eap' or 'init_vp' if the EAP or VP dynamics are chosen. Co-authored-by: JFLemieux73 <jean-francois.lemieux@canada.ca>
CICE-Consortium · Sep 22, 2020 · f7fd063 · f7fd063
1 parent c0a2e2d
commit f7fd063
Show file tree

Hide file tree

Showing 34 changed files with 4,599 additions and 250 deletions.
diff --git a/cicecore/cicedynB/dynamics/ice_dyn_eap.F90 b/cicecore/cicedynB/dynamics/ice_dyn_eap.F90
@@ -122,7 +122,8 @@ subroutine eap (dt)
       use ice_dyn_shared, only: fcor_blk, ndte, dtei, &
           denom1, uvel_init, vvel_init, arlx1i, &
           dyn_prep1, dyn_prep2, stepu, dyn_finish, &
-          basal_stress_coeff, basalstress
+          basal_stress_coeff, basalstress, &
+          stack_velocity_field, unstack_velocity_field
       use ice_flux, only: rdg_conv, strairxT, strairyT, &
           strairx, strairy, uocn, vocn, ss_tltx, ss_tlty, iceumask, fm, &
           strtltx, strtlty, strocnx, strocny, strintx, strinty, taubx, tauby, &
@@ -354,11 +355,6 @@ subroutine eap (dt)
                                       vicen    = vicen   (i,j,:,iblk), & 
                                       strength = strength(i,j,  iblk) )
          enddo  ! ij
-
-         ! load velocity into array for boundary updates
-         fld2(:,:,1,iblk) = uvel(:,:,iblk)
-         fld2(:,:,2,iblk) = vvel(:,:,iblk)
-
       enddo  ! iblk
       !$TCXOMP END PARALLEL DO
 
@@ -369,19 +365,8 @@ subroutine eap (dt)
       call ice_timer_start(timer_bound)
       call ice_HaloUpdate (strength,           halo_info, &
                            field_loc_center,   field_type_scalar)
-      ! velocities may have changed in dyn_prep2
-      call ice_HaloUpdate (fld2,               halo_info, &
-                           field_loc_NEcorner, field_type_vector)
       call ice_timer_stop(timer_bound)
 
-      ! unload
-      !$OMP PARALLEL DO PRIVATE(iblk)
-      do iblk = 1, nblocks
-         uvel(:,:,iblk) = fld2(:,:,1,iblk)
-         vvel(:,:,iblk) = fld2(:,:,2,iblk)
-      enddo
-      !$OMP END PARALLEL DO
-
       if (maskhalo_dyn) then
          call ice_timer_start(timer_bound)
          halomask = 0
@@ -392,6 +377,19 @@ subroutine eap (dt)
          call ice_HaloMask(halo_info_mask, halo_info, halomask)
       endif
 
+      ! velocities may have changed in dyn_prep2
+      call stack_velocity_field(uvel, vvel, fld2)
+      call ice_timer_start(timer_bound)
+      if (maskhalo_dyn) then
+         call ice_HaloUpdate (fld2,               halo_info_mask, &
+                              field_loc_NEcorner, field_type_vector)
+      else
+         call ice_HaloUpdate (fld2,               halo_info, &
+                              field_loc_NEcorner, field_type_vector)
+      endif
+      call ice_timer_stop(timer_bound)
+      call unstack_velocity_field(fld2, uvel, vvel)
+
       !-----------------------------------------------------------------
       ! basal stress coefficients (landfast ice)
       !-----------------------------------------------------------------
@@ -472,10 +470,6 @@ subroutine eap (dt)
                         uvel     (:,:,iblk), vvel    (:,:,iblk), &
                         Tbu      (:,:,iblk))
 
-            ! load velocity into array for boundary updates
-            fld2(:,:,1,iblk) = uvel(:,:,iblk)
-            fld2(:,:,2,iblk) = vvel(:,:,iblk)
-
       !-----------------------------------------------------------------
       ! evolution of structure tensor A
       !-----------------------------------------------------------------
@@ -501,6 +495,7 @@ subroutine eap (dt)
          enddo
          !$TCXOMP END PARALLEL DO
 
+         call stack_velocity_field(uvel, vvel, fld2)
          call ice_timer_start(timer_bound)
          if (maskhalo_dyn) then
             call ice_HaloUpdate (fld2,               halo_info_mask, &
@@ -510,14 +505,7 @@ subroutine eap (dt)
                                  field_loc_NEcorner, field_type_vector)
          endif
          call ice_timer_stop(timer_bound)
-
-         ! unload
-         !$OMP PARALLEL DO PRIVATE(iblk)
-         do iblk = 1, nblocks
-            uvel(:,:,iblk) = fld2(:,:,1,iblk)
-            vvel(:,:,iblk) = fld2(:,:,2,iblk)
-         enddo
-         !$OMP END PARALLEL DO
+         call unstack_velocity_field(fld2, uvel, vvel)
 
       enddo                     ! subcycling
 
@@ -556,16 +544,12 @@ end subroutine eap
 !=======================================================================
 
 ! Initialize parameters and variables needed for the eap dynamics
-! (based on init_evp)
+! (based on init_dyn)
 
-      subroutine init_eap (dt)
+      subroutine init_eap
 
       use ice_blocks, only: nx_block, ny_block
       use ice_domain, only: nblocks
-      use ice_dyn_shared, only: init_evp
-
-      real (kind=dbl_kind), intent(in) :: &
-         dt      ! time step
 
       ! local variables
 
@@ -595,8 +579,6 @@ subroutine init_eap (dt)
          file=__FILE__, line=__LINE__)
       phi = pi/c12 ! diamond shaped floe smaller angle (default phi = 30 deg)
 
-      call init_evp (dt)
-
       !$OMP PARALLEL DO PRIVATE(iblk,i,j)
       do iblk = 1, nblocks
       do j = 1, ny_block
@@ -1321,7 +1303,7 @@ subroutine stress_eap  (nx_block,   ny_block,       &
          tensionse = -cym(i,j)*uvel(i  ,j-1) - dyt(i,j)*uvel(i-1,j-1) &
                    +  cxp(i,j)*vvel(i  ,j-1) - dxt(i,j)*vvel(i  ,j  )
 
-         ! shearing strain rate  =  e_12
+         ! shearing strain rate  =  2*e_12
          shearne = -cym(i,j)*vvel(i  ,j  ) - dyt(i,j)*vvel(i-1,j  ) &
                  -  cxm(i,j)*uvel(i  ,j  ) - dxt(i,j)*uvel(i  ,j-1)
          shearnw = -cyp(i,j)*vvel(i-1,j  ) + dyt(i,j)*vvel(i  ,j  ) &

diff --git a/cicecore/cicedynB/dynamics/ice_dyn_evp.F90 b/cicecore/cicedynB/dynamics/ice_dyn_evp.F90
@@ -94,7 +94,7 @@ subroutine evp (dt)
           ice_timer_start, ice_timer_stop, timer_evp_1d, timer_evp_2d
       use ice_dyn_evp_1d, only: ice_dyn_evp_1d_copyin, ice_dyn_evp_1d_kernel, &
           ice_dyn_evp_1d_copyout
-      use ice_dyn_shared, only: kevp_kernel
+      use ice_dyn_shared, only: kevp_kernel, stack_velocity_field, unstack_velocity_field
 
       real (kind=dbl_kind), intent(in) :: &
          dt      ! time step
@@ -297,10 +297,6 @@ subroutine evp (dt)
                                       strength = strength(i,j,  iblk) )
          enddo  ! ij
 
-         ! load velocity into array for boundary updates
-         fld2(:,:,1,iblk) = uvel(:,:,iblk)
-         fld2(:,:,2,iblk) = vvel(:,:,iblk)
-
       enddo  ! iblk
       !$TCXOMP END PARALLEL DO
 
@@ -311,19 +307,8 @@ subroutine evp (dt)
       call ice_timer_start(timer_bound)
       call ice_HaloUpdate (strength,           halo_info, &
                            field_loc_center,   field_type_scalar)
-      ! velocities may have changed in dyn_prep2
-      call ice_HaloUpdate (fld2,               halo_info, &
-                           field_loc_NEcorner, field_type_vector)
       call ice_timer_stop(timer_bound)
 
-      ! unload
-      !$OMP PARALLEL DO PRIVATE(iblk)
-      do iblk = 1, nblocks
-         uvel(:,:,iblk) = fld2(:,:,1,iblk)
-         vvel(:,:,iblk) = fld2(:,:,2,iblk)
-      enddo
-      !$OMP END PARALLEL DO
-
       if (maskhalo_dyn) then
          call ice_timer_start(timer_bound)
          halomask = 0
@@ -334,6 +319,19 @@ subroutine evp (dt)
          call ice_HaloMask(halo_info_mask, halo_info, halomask)
       endif
 
+      ! velocities may have changed in dyn_prep2
+      call stack_velocity_field(uvel, vvel, fld2)
+      call ice_timer_start(timer_bound)
+      if (maskhalo_dyn) then
+         call ice_HaloUpdate (fld2,               halo_info_mask, &
+                              field_loc_NEcorner, field_type_vector)
+      else
+         call ice_HaloUpdate (fld2,               halo_info, &
+                              field_loc_NEcorner, field_type_vector)
+      endif
+      call ice_timer_stop(timer_bound)
+      call unstack_velocity_field(fld2, uvel, vvel)
+
       !-----------------------------------------------------------------
       ! basal stress coefficients (landfast ice)
       !-----------------------------------------------------------------
@@ -442,13 +440,10 @@ subroutine evp (dt)
                         uvel_init(:,:,iblk), vvel_init(:,:,iblk),&
                         uvel     (:,:,iblk), vvel    (:,:,iblk), &
                         Tbu      (:,:,iblk))
-
-            ! load velocity into array for boundary updates
-            fld2(:,:,1,iblk) = uvel(:,:,iblk)
-            fld2(:,:,2,iblk) = vvel(:,:,iblk)
          enddo
          !$TCXOMP END PARALLEL DO
 
+         call stack_velocity_field(uvel, vvel, fld2)
          call ice_timer_start(timer_bound)
          if (maskhalo_dyn) then
             call ice_HaloUpdate (fld2,               halo_info_mask, &
@@ -458,14 +453,7 @@ subroutine evp (dt)
                                  field_loc_NEcorner, field_type_vector)
          endif
          call ice_timer_stop(timer_bound)
-
-         ! unload
-         !$OMP PARALLEL DO PRIVATE(iblk)
-         do iblk = 1, nblocks
-            uvel(:,:,iblk) = fld2(:,:,1,iblk)
-            vvel(:,:,iblk) = fld2(:,:,2,iblk)
-         enddo
-         !$OMP END PARALLEL DO
+         call unstack_velocity_field(fld2, uvel, vvel)
 
       enddo                     ! subcycling
       endif  ! kevp_kernel
@@ -599,6 +587,8 @@ subroutine stress (nx_block,   ny_block,   &
                          rdg_conv,   rdg_shear,  & 
                          str )
 
+      use ice_dyn_shared, only: strain_rates, deformations
+
       integer (kind=int_kind), intent(in) :: & 
          nx_block, ny_block, & ! block dimensions
          ksub              , & ! subcycling step
@@ -676,58 +666,20 @@ subroutine stress (nx_block,   ny_block,   &
       ! strain rates
       ! NOTE these are actually strain rates * area  (m^2/s)
       !-----------------------------------------------------------------
-         ! divergence  =  e_11 + e_22
-         divune    = cyp(i,j)*uvel(i  ,j  ) - dyt(i,j)*uvel(i-1,j  ) &
-                   + cxp(i,j)*vvel(i  ,j  ) - dxt(i,j)*vvel(i  ,j-1)
-         divunw    = cym(i,j)*uvel(i-1,j  ) + dyt(i,j)*uvel(i  ,j  ) &
-                   + cxp(i,j)*vvel(i-1,j  ) - dxt(i,j)*vvel(i-1,j-1)
-         divusw    = cym(i,j)*uvel(i-1,j-1) + dyt(i,j)*uvel(i  ,j-1) &
-                   + cxm(i,j)*vvel(i-1,j-1) + dxt(i,j)*vvel(i-1,j  )
-         divuse    = cyp(i,j)*uvel(i  ,j-1) - dyt(i,j)*uvel(i-1,j-1) &
-                   + cxm(i,j)*vvel(i  ,j-1) + dxt(i,j)*vvel(i  ,j  )
-
-         ! tension strain rate  =  e_11 - e_22
-         tensionne = -cym(i,j)*uvel(i  ,j  ) - dyt(i,j)*uvel(i-1,j  ) &
-                   +  cxm(i,j)*vvel(i  ,j  ) + dxt(i,j)*vvel(i  ,j-1)
-         tensionnw = -cyp(i,j)*uvel(i-1,j  ) + dyt(i,j)*uvel(i  ,j  ) &
-                   +  cxm(i,j)*vvel(i-1,j  ) + dxt(i,j)*vvel(i-1,j-1)
-         tensionsw = -cyp(i,j)*uvel(i-1,j-1) + dyt(i,j)*uvel(i  ,j-1) &
-                   +  cxp(i,j)*vvel(i-1,j-1) - dxt(i,j)*vvel(i-1,j  )
-         tensionse = -cym(i,j)*uvel(i  ,j-1) - dyt(i,j)*uvel(i-1,j-1) &
-                   +  cxp(i,j)*vvel(i  ,j-1) - dxt(i,j)*vvel(i  ,j  )
-
-         ! shearing strain rate  =  e_12
-         shearne = -cym(i,j)*vvel(i  ,j  ) - dyt(i,j)*vvel(i-1,j  ) &
-                 -  cxm(i,j)*uvel(i  ,j  ) - dxt(i,j)*uvel(i  ,j-1)
-         shearnw = -cyp(i,j)*vvel(i-1,j  ) + dyt(i,j)*vvel(i  ,j  ) &
-                 -  cxm(i,j)*uvel(i-1,j  ) - dxt(i,j)*uvel(i-1,j-1)
-         shearsw = -cyp(i,j)*vvel(i-1,j-1) + dyt(i,j)*vvel(i  ,j-1) &
-                 -  cxp(i,j)*uvel(i-1,j-1) + dxt(i,j)*uvel(i-1,j  )
-         shearse = -cym(i,j)*vvel(i  ,j-1) - dyt(i,j)*vvel(i-1,j-1) &
-                 -  cxp(i,j)*uvel(i  ,j-1) + dxt(i,j)*uvel(i  ,j  )
-
-         ! Delta (in the denominator of zeta, eta)
-         Deltane = sqrt(divune**2 + ecci*(tensionne**2 + shearne**2))
-         Deltanw = sqrt(divunw**2 + ecci*(tensionnw**2 + shearnw**2))
-         Deltasw = sqrt(divusw**2 + ecci*(tensionsw**2 + shearsw**2))
-         Deltase = sqrt(divuse**2 + ecci*(tensionse**2 + shearse**2))
-
-      !-----------------------------------------------------------------
-      ! on last subcycle, save quantities for mechanical redistribution
-      !-----------------------------------------------------------------
-         if (ksub == ndte) then
-            divu(i,j) = p25*(divune + divunw + divuse + divusw) * tarear(i,j)
-            tmp = p25*(Deltane + Deltanw + Deltase + Deltasw)   * tarear(i,j)
-            rdg_conv(i,j)  = -min(divu(i,j),c0)
-            rdg_shear(i,j) = p5*(tmp-abs(divu(i,j))) 
-
-            ! diagnostic only
-            ! shear = sqrt(tension**2 + shearing**2)
-            shear(i,j) = p25*tarear(i,j)*sqrt( &
-                 (tensionne + tensionnw + tensionse + tensionsw)**2 &
-                +  (shearne +   shearnw +   shearse +   shearsw)**2)
-
-         endif
+         call strain_rates (nx_block,   ny_block,   &
+                            i,          j,          &
+                            uvel,       vvel,       &
+                            dxt,        dyt,        &
+                            cxp,        cyp,        &
+                            cxm,        cym,        &
+                            divune,     divunw,     &
+                            divuse,     divusw,     &
+                            tensionne,  tensionnw,  &
+                            tensionse,  tensionsw,  &
+                            shearne,    shearnw,    &
+                            shearse,    shearsw,    &
+                            Deltane,    Deltanw,    &
+                            Deltase,    Deltasw     )
 
       !-----------------------------------------------------------------
       ! strength/Delta                   ! kg/s
@@ -902,6 +854,23 @@ subroutine stress (nx_block,   ny_block,   &
 
       enddo                     ! ij
 
+      !-----------------------------------------------------------------
+      ! on last subcycle, save quantities for mechanical redistribution
+      !-----------------------------------------------------------------
+      if (ksub == ndte) then
+         call deformations (nx_block  , ny_block  , &
+                            icellt    ,             &
+                            indxti    , indxtj    , &
+                            uvel      , vvel      , &
+                            dxt       , dyt       , &
+                            cxp       , cyp       , &
+                            cxm       , cym       , &
+                            tarear    ,             &
+                            shear     , divu      , &
+                            rdg_conv  , rdg_shear )
+
+      endif
+
       end subroutine stress
 
 !=======================================================================

diff --git a/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90 b/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90
@@ -326,7 +326,7 @@ subroutine stress_i(NA_len, &
       ! tension strain rate  =  e_11 - e_22
       tensionne = -cym*uvel(iw) - dyt(iw)*tmp_uvel_ee                      &
                 +  cxm*vvel(iw) + dxt(iw)*tmp_vvel_se
-      ! shearing strain rate  =  e_12
+      ! shearing strain rate  =  2*e_12
       shearne = -cym*vvel(iw) - dyt(iw)*tmp_vvel_ee                        &
               -  cxm*uvel(iw) - dxt(iw)*tmp_uvel_se
       ! Delta (in the denominator of zeta, eta)
@@ -614,7 +614,7 @@ subroutine stress_l(NA_len, tarear, &
       tensionse = -cym*tmp_uvel_se - dyt(iw)*tmp_uvel_ne                  &
                 +  cxp*tmp_vvel_se - dxt(iw)*vvel(iw)
 
-      ! shearing strain rate  =  e_12
+      ! shearing strain rate  =  2*e_12
       shearne = -cym*vvel(iw) - dyt(iw)*tmp_vvel_ee                        &
               -  cxm*uvel(iw) - dxt(iw)*tmp_uvel_se
       shearnw = -cyp*tmp_vvel_ee + dyt(iw)*vvel(iw)                        &