Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFDL to main, 2023-11-13 #1616

Merged

Conversation

marshallward
Copy link
Collaborator

This PR contains many new features, bugfixes, and overall changes to the MOM6 codebase. In particular, the model now includes a full non-Boussinesq solver alongside the existing Boussinesq and "semi-non-Boussinesq" modes. Other major features include updates to the ice shelf model, improved particle tracer support, improvements to the computational performance of the Zanna-Bolton parameterization, fixes to the internal tide energetics, brine plume mixing, and self-attraction and loading.

It also includes the usual bundle of bug fixes and code refactors.

Non-Boussinesq

Ice Shelf

Particle Tracers

ZB2020

Internal Tide

Brine Plume Mixing

Self-attraction and Loading

Additional Features

Bugfix

Refactor

Testing

Build

Misc

Contributors

marshallward and others added 30 commits July 7, 2023 10:48
The autoconf Python interpreter search was slightly modified to search
for Python even if $PYTHON is set to an empty string.  This is done by
unsetting PYTHON if it is set but empty, then following the usual macro.

This was required since `export PYTHON` in a Makefile will create the
`PYTHON` variable but will assign it no value (i.e. empty string).
This causes issues in some build environments.

The backup `configure~` script was also added to the developer
`ac-clean` cleanup rule.
  Refactored 6 files in the ALE directory to calculate the nominal depth in
thickness units in a single place (it is done in regridding_main now) and pass
it to the various places where it is used, in a preparatory step to modify how
this calculation is done in non-Boussinesq mode.  There are new arguments to
several publicly visible routines, including:

- Add non_depth_H arguments to hybgen_regrid, build_zstar_grid,
  build_sigma_grid, build_rho_grid, build_grid_HyCOM1, build_grid_adaptive,
  build_adapt_column and build_grid_arbitrary

- Add optional zScale arguments to build_zstar_grid and build_grid_HyCOM1

- Add unit_scale_type arguments to regridding_main, ALE_regrid_accelerated and
  ALE_offline_inputs

  Also eliminated an incorrect rescaling GV%Z_to_H facto when calculating the
total column thickness from the layer thicknesses when an ice shelf is used
with a Hycom grid.  This would have caused dimensional consistency testing to
fail.

  Added the new runtime parameters HYBGEN_H_THIN, HYBGEN_FAR_FROM_SURFACE
HYBGEN_FAR_FROM_BOTTOM, and HYBGEN_DENSITY_EPSILON to set previously hard-coded
dimensional parameters used in the Hybgen regridding code and store these
values in new variables in hybgen_regrid_CS.  Two of these are no longer passed
to hybgen_column_regrid as separate parameters.  By default these new runtime
parameters recover the previous hard-coded values.

  Also eliminated an unused block of code in build_rho_column.  Several
comments documenting variables or their units were also added.

  All answers are bitwise identical, but there are 4 new runtime parameters
that would appear in some MOM_parameter_doc files and there are changes to the
arguments to 11 routines.
As described in issue #372, I would like to be able to create restart files that
contain information about the particle location. These files will be written at
the same time as other restart files. I cannot add these calls directly to the
driver, because the driver does not have information about the particle location.
We have added save_MOM6_internal_state as a subroutine in MOM.F90, and
we added calls to this subroutine from each of the drivers. We hope this will
allow for more new packages to write restart files in the future.

Co-authored by Spencer Jones <spencerjones@tamu.edu>
  Added the integer valid_SpV_halo to the thermo_var_ptrs type to indicate
whether the SpV_array has been updated and its valid halo size, to facilitate
error detection and debugging in non-Boussinesq mode.  Tv%valid_SpV_halo is set
to the halo size in calc_derived_thermo or after a halo update is done to
tv%SpV_avg, and it is set to a negative value right after calls that change
temperatures and salinities (such as by ALE remapping) unless there is a call to
calc_derived_thermo.  Tests for the validity of tv%SpV_avg are added to the
routines behind thickness_to_dz, with fatal errors issued if invalid arrays
would be used, but more tests could perhaps be used in any parameterization
routines where tv%SpV_avg is used directly.  Handling the updates to tv%SpV_avg
this way helps to avoid unnecessary calls to calc_derived_thermo, which in turn
has equation of state calls that can be expensive, while also providing
essential verification of new code related to the non-Boussinesq code.  These
tests can probably be commented out or removed for efficiency once there is a
full suite of regression tests for the fully non-Boussinesq mode of MOM6.  In
addition, a new optional debug argument was added to calc_derived_thermo which
can be used to triggers checksums for the variables used to calculate
tv%SpV_avg.  One call to calc_derived_thermo was also added just before the
initialization call to ALE_regrid that will be needed with the next commit, but
does not change answers yet.  All answers are bitwise identical, but there is a
new element in a transparent and widely used type and a new optional argument to
a public interface.
  Use RHO_KV_CONVERT instead of RHO_0 to set the non-Boussinesq version of
GV%m_to_H, so that there is a mechanism for testing the independence of the
fully non-Boussinesq mode from the Boussinesq reference density.  With this
change, GV%Z_to_H is not guaranteed to be equal to (GV%Z_to_m*GV%m_to_H), with
the latter expression preferred when setting parameters. By default the two
parameters are the same, and they will probably only ever differ in testing the
code.  All Boussinesq solutions are bitwise identical, but there are differences
in the description of RHO_KV_CONVERT that will appear in MOM_parameter_doc
files.
  Add new arguments to 7 routines that will be needed for the non-Boussinesq
capability, but do not use them yet, so that there will be fewer cross file
dependencies as the various changes are being reviewed simultaneously.  The
impacted interfaces are MEKE_int, vertvisc_coef, sumSWoverBands, KPP_calculate,
differential_diffuse_T_S, set_BBL_TKE, and apply_sponge In the three
step_MOM_dyn_... routines and in calculateBuoyancyFlux1d, this change includes
calls to thickness_to_dz to calculate the new vertical distance arrays that will
be passed into vertvisc_coef or sumSWoverBands.  The only place where the new
arguments are actually used is in sumSWoverBands and set_opacity where the
changes are particularly simple.  All answers are bitwise identical, but there
are new non-optional arguments to seven publicly visible routines.
* Restore functionality for reading slices from 3d volumes in MOM_io

 - The recent MOM_io modifications in support of FMS2_io accidentally
   removed support for reading on-grid data (same horizontal grid as
   model) k-slices. This is needed in some configurations in the model
   state initialization.

* Add FMS1 interfaces

* Additional patches to enable reading ongrid state initialization data

 - read local 3d volume rather than attempting to slice ongrid
   data vertically.

 - Related bugfixes in MOM_io
- We were reading KV_ML_INVZ2 without logging, then checking for KVML
  and finally logging based on a combination of the two. This had the side
  affect that we get warnings about not using KVML even if KVML was not
  present.
- The fix checks for KVML first, and then changes the default so that
  when KVML=1e-4 is replaced by KV_ML_INVZ2=1e-4 we end up with no
  warnings and KVML can be obsoleted safely.
  Note: this commit alone does not remove all warnings from the MOM6-examples
  suite because we still need to fix the MOM_input that still use KVML
- KVML needs to be unscaled since it is the default for KV_ML_INVZ2
- tc3 used KVML and has been corrected.
  Use the new runtime parameter RHO_PGF_REF instead of RHO_0 to set the
reference density that is subtracted off from the other densities when
calculating the finite volume pressure gradient forces.  Although the answers
are mathematically equivalent for any value of this parameter, a judicious
choice can reduce the impacts of roundoff errors by about 2 orders of
magnitude.  By default, RHO_PGF_REF is set to RHO_0, and all answers are bitwise
identical.  However, there is a new runtime parameter that appears in many of
the MOM_parameter_doc.all files.
The message that a file is being created was issued as a WARNING
when we all agree it should really be a NOTE. Depth_list.nc is
read if it is present to avoid recomputing a sorted list, but the
absence of the file is not an error and does not warrant a warning.

Changes:
- Changed WARNING to NOTE.
- Removed MOM_mesg from imports since it wasn't being used.
The interpolation scheme for state-dependent diagnostic coordinates
was incorrectly registering as the same parameter as the main model.
This meant it was never possible to change the interpolation scheme
from the default (which was not the same as the main model).

Fix registers the generated parameter name which was always computed
but not used. A typical example of the generated parameter is
"DIAG_COORD_INTERP_SCHEME_RHO2".
  Fixed a bug in which wave_speed_init was effectively discarding any values of
mono_N2_depth passed to it via the optional argument mono_N2_depth, but also
changed the default value of RESOLN_N2_FILTER_DEPTH, which was previously being
discarded, to disable the monotonization and replicate the previous results.
There were also clarifying additions made to the description how to disable
RESOLN_N2_FILTER_DEPTH.  This will change some entries in MOM_parameter_doc
files, and it will change solutions in cases that set RESOLN_N2_FILTER_DEPTH to
a non-default value and have parameter settings that use the resolution function
to scale their horizontal mixing.  There are, however, no known active
simulations where the answers are expected to change.
  Revised the calculation of gprime and the coordinate densities (GV%Rlay) in
fully non-Boussinesq mode to use the arithmetic mean of adjacent coordinate
densities in the denominator of the expression for g_prime in place of RHO_0.
Also use LIGHTEST_DENSITY in place of RHO_0 to specify the top-level coordinate
density in certain coordinate modes.  Also made corresponding changes to the
fully non-Boussinesq APE calculation when CALCULATE_APE is true, and eliminated
an incorrect calculation of the layer volumes in non-Boussinesq mode using the
Boussinesq reference density that was never actually being used when
CALCULATE_APE is false.

  This commit will change answers in some fully non-Boussinesq calculations, and
an existing runtime parameter is used and logged in some new cases, changing
the MOM_parameter_doc file in those cases.
  Refactored thickness_diffuse when in non-Boussinesq mode to avoid any
dependencies on the Boussinesq reference density, and to translate the volume
streamfunction into the mass streamfunction using an appropriately defined
in-situ density averaged to the interfaces at velocity points.  This form
follows the suggestions of Appendix A.3.2 of Griffies and Greatbatch (Ocean
Modelling, 2012) when in non-Boussinesq mode.  Thickness_diffuse_full was also
revised to work properly in non-Boussinesq mode (and not depend on the
Boussinesq reference density) when no equation of state is used.

  As a part of these changes, the code now uses thickness-based streamfunctions
and other thickness-based internal calculations in MOM_thickness_diffuse.  For
example, the overturning streamfunctions with this change are now in m3/s in
Boussinesq mode, but kg/s in non-Boussinesq mode.  These changes use a call to
thickness_to_dz to set up a separate variable with the vertical distance across
layers, and in non-Boussinesq mode they use tv%SpV_avg to estimate in situ
densities.  Additional debugging checksums were added to thickness_diffuse.

  The code changes are extensive with 15 new or renamed internal variables, and
changes to the units of 9 other internal variables and 3 arguments to the
private routine streamfn_solver.  After this change, GV%Rho, GV%Z_to_H and
GV%H_to_Z are no longer used in any non-Boussinesq calculations (12 such
instances having been elimated).  Because some calculations have to be redone
with the separate thickness and dz variables, this will be more expensive than
the original version.

  No public interfaces are changed, and all answers are bitwise identical in
Boussinesq or semiBoussinesq mode, but they will change in non-Boussinesq mode
when the isopycnal height diffusion parameterization is used.
* Salt data structures

* First steps at brine plume: pass info from SIS2

* The brine plume parameterization,

- including now passing the dimensional scaling tests.

* Fix problem when running Tidal_bay case with gnu.

* Avoiding visc_rem issues inside land mask.

Tweaking the brine plume code.

* Using the proper MLD in the brine plumes

- it now works better on restart

* Always including MLD in call to applyBoundary...

- I could move it up and make it not optional.

* Adding some OpenMP directives to brine plumes
This commit brings the drifters interface up-to-date with the current version of the drifters package, which requires h (layer thickness) to calculate the vertical movement of particles. The interfaces in the code and in config_src/external are updated to pass this information to the drifters package.
  Pass dt_kappa_smooth to calc_isoneutral_slopes and vert_fill_TS in units of
[H Z ~> m2 or kg m-1] instead of [Z2 ~> m2] for consistency with the units of
other diffusivities in the code and to reduce the depenency on the  Boussinesq
reference density in non-Boussinesq configurations.  In addition to the changes
to the units of these two arguments, there is a new unit_scale_type argument to
vert_fill_TS and MOM_calc_varT and a new verticalGrid_type argument to
MOM_stoch_eos_init.  The units of 4 vertical diffusivities in the control
structures in 4 different modules are also changed accordingly.

  All answers are bitwise identical in Boussinesq mode, but they can change for
some non-Boussinesq configurations.  There are new mandatory arguments to three
publicly visible routines.
  Added a comment justifying the use of a fixed rescaling factor for the
diffusivity used in vert_fill_TS.  All answers and output are identical.
  Added the new public interface find_ustar to extract the friction velocity
from either a forcing type argument, or a mech_forcing_type argument, either
directly or from tau_mag, and in non-Boussinesq mode by using the time-evolving
surface specific volume.  Find_ustar is an overloaded interface to
find_ustar_fluxes or find_ustar_mech_forcing, which are the same but for the
type of one of their arguments.  For now, the subroutines bulkmixedlayer,
mixedlayer_restrajt_OM4, mixedlayer_restrat_Bodner and mixedlayer_restrat_BML
are calling find_ustar to avoid code duplication during the transition to work
in fully non-Boussinesq mode, but it will eventually be used in about another
half dozen other places.

  All Boussinesq answers are bitwise identical, but non-Boussinesq answers will
change and become less dependent on the Boussinesq reference density, and there
is a new publicly visible interface wrapping two subroutines.
  Changed the units of the optional mono_N2_depth argument to wave_speed,
wave_speed_init and wave_speed_set_param in thickness units instead of height
units.  Accordingly, the units of one element each in the diagnostics_CS and
wave_speed_CS and a local variable in VarMix_init are also changed to thickness
units.  The unit descriptions of some comments describing diagnostics were also
amended to also describe the non-Boussinesq versions.  Because this is
essentially just changing when the unit conversion occurs, all answers are
bitwise identical, but there are changes to the units of an optional argument in
3 publicly visible routines.
  Added the new runtime parameter BT_RHO_LINEARIZED to specify the density that
is used to convert total water column thicknesses into mass in non-Boussinesq
mode with linearized options in the barotropic solver or when estimating the
stable barotropic timestep without access to the full baroclinic model state.
The default is set to RHO_0 and answers do not change by default.  This new
parameter is used in non-Boussinesq mode with some options in btcalc and
find_face_areas, when LINEARIZED_BT_CORIOLIS = True or BT_NONLIN_STRESS = False,
and in the unit conversion of the ice strength with dynamic pressure.

  Also cancelled out factors of GV%Z_to_H in MOM_barotropic.F90 to simplify the
code and reduce the dependence on the value of GV%Rho_0 in non-Boussinesq mode.
This involved changing the units of 4 variables in the barotropic_CS type, 3
internal variables in btstep and an internal variable in barotropic_init to use
thickness units.  The rescaled internal variable mass_to_Z was also replaced
with the equivalent GV%RZ_to_H.  There are also 4 new debugging messages.  Also
modified the units of the gtot_est argument to match those of pbce.  There is a
new element in barotropic_CS.

  Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are
bitwise identical in that mode, but in non-Boussinesq mode this conversion
involves multiplication and division by GV%Rho_0, so while all answers are
mathematically equivalent, this change does change answers at roundoff in
non-Boussinesq mode.  Additionally there is a new runtime parameter that will
appear in some MOM_parameter_doc files.
  Use thickness_to_dz to convert layer thicknesses to depths in 4 tracer modules
(DOME_tracer, dye_example, ideal_age_example and nw2_tracers) so that this
conversion is done correctly in non-Boussineq mode, and there is no longer any
dependency on the Boussinesq reference density in that mode.

  This change includes the addition of a thermo_var_ptrs argument to 5 routines
(initialize_DOME_tracer, initialize_dye_tracer, dye_tracer_column_physics
ideal_age_tracer_column_physics and count_BL_layers) and changes to the units of
some internal variables, and the addition of 6 new 2-d or 3-d arrays with the
vertical distance across layers.  An unused param_file_type argument to
initialize_DOME_tracer was also eliminated.

  Comments were also added to describe the units of 5 of the variables in the
ideal age tracer control structure and 7 internal variables in that same module,
and there was some minor cleanup of the formatting cf calls in
tracer_flow_control_init.

  There was some minor refactoring in the ns2_tracers module to use SZK_(GV)
instead of SZK_(G) to declare the vertical extent of some arrays, and the
vertical indexing convention for interfaces in nw2_tracer_dist was revised from
starting at 0 to start at 1 for consistency with all the other code in MOM6.

  Also moved the code to do halo updates for the physical model state variables
and call calc_derived_thermo before calling tracer_flow_control_init, because
some routines there are now using the layer average specific volume to convert
between thicknesses and heights when in non-Boussinesq mode.

  All answers in Boussinesq mode are bitwise identical, but these passive tracer
modules have slightly different answers in non-Boussinesq mode.  There are
changes to the non-optional arguments to 4 public interfaces.
  Changed a recently added OMP directive for plume_flux from private to
firstprivate to reflect how this variable is actually used.  This bug was
introduced with PR #401, but was causing sporadic failures in some of our
pipeline tests with the intel compiler (essentially due to initialized
memory when openMP is used) for subsequent commits.
This patch merges the internal `save_restart` function with the new
`save_MOM6_internal_state` function into a new general MOM restart
function.

It also makes an effort to eliminate `MOM_restart` as a driver
dependency, narrowing the required MOM API for existing and future
drivers.

Also removes the `restart_CSp` argument from `MOM_wave_interface_init`,
since it appeared to be used for nothing.
MOM simulations typically abort of the restart directory (usually
RESTART) are absent.  This patch adds POSIX support for mkdir() and
creates the directory if it is missing.
Using inquire() to check for directory existence is not possible, since
at least one compiler (Intel) does not consider directories to be files.

The inquire call is replaced with a C interface to the POSIX stat()
function.  We do not fully emulate the behavior of stat, but we use its
return value to determine existence of directories.  This provides a
more reliable method for identifying the existence of the directory.

This should resolve many of the observed problems with RESTART creation
in coupled runs.
  Cancelled out factors of GV%Z_to_H in MOM_hor_visc.F90 to simplify the code
and reduce the dependence on the value of GV%Rho_0 in non-Boussinesq mode.  This
involved changing the units of 3 internal variables in horizontal_viscosity and
one element in the hor_visc_CS type to use thickness units or their inverse.
Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are
bitwise identical in that mode, but in non-Boussinesq mode this conversion
involves multiplication and division by GV%Rho_0, so while all answers are
mathematically equivalent, this change does change answers at roundoff in
non-Boussinesq mode.
  Revised the units of 12 vertvisc_type elements to be based on thicknesses, so
that vertical viscosities (in [H Z T-1 ~> m2 s-1 or Pa s]) are stored as dynamic
viscosites when in non-Boussinesq mode, with analogous changes to the diapycanl
diffusivity (now in [H Z T-1 ~> m2 s-1 or kg m-1 s-1]).  Similarly changed the
units of the 2 Rayleigh drag velocity elements (Ray_u and Ray_v) of the
vertvisc_type from vertical velocity units to thickness flux units and to more
accurately reflect the nature of these fields.  The bottom boundary layer TKE
source element (TKE_BBL) was also revised to [H Z2 T-3 ~> m3 s-3 or W m-2].
This commit also adds required changes to the units of the viscosities or
shear-driven diffusivities returned from KPP_calculate, calculate_CVMix_shear,
calculate_CVMix_conv, Calculate_kappa_shear, Calc_kappa_shear_vertex,
calculate_tidal_mixing and calculate_CVMix_tidal.

 Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are
bitwise identical in that mode, but in non-Boussinesq mode this conversion
involves multiplication and division by GV%Rho_0, so while all answers are
mathematically equivalent, this change does change answers at roundoff in
non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.
  Rescaled diapycnal diffusivities passed as arguments in non-Boussinesq mode,
to be equivalent to the thermal conductivity divided by the heat capacity,
analogously to the difference between a kinematic viscosity and a dynamic
viscosity, so that the new units are [H Z T-1 ~> m2 s-1 or kg m-1 s-1].

  This includes changing the units of 4 arguments to set_diffusivity;
3 arguments each to calculate_bkgnd_mixing, add_drag_diffusivity,
add_LOTW_BBL_diffusivity, user_change_diff, calculate_tidal_mixing and
add_int_tide_diffusivity; 2 arguments to KPP_calculate, calculate_CVMix_conv,
compute_ddiff_coeffs, differential_diffuse_T_S, entrainment_diffusive,
double_diffusion, add_MLrad_diffusivity, and calculate_CVMix_tidal; and one
argument to energetic_PBL.

  The units of 36 internal variables were also changed, as were a total of
29 elements in various opaque types, including 8 elements in bkgnd_mixing_cs,
2 in diabatic_CC, 3 in tidal_mixing_diags type, 1 in user_change_diff_CS,
9 in set_diffusivity_CS type, and 6 elements in diffusivity_diags.

  Two new internal variables were added, and several redundant GV%H_to_Z
conversion factors were also cancelled out, some using that
GV%H_to_Z*GV%Rho0 = GV%H_to_RZ.

  Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are
bitwise identical in that mode, but in non-Boussinesq mode this conversion
involves multiplication and division by GV%Rho_0, so while all answers are
mathematically equivalent, this change does change answers at roundoff in
non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.
  Changed the units for TKE arguments to [H Z2 T-3 ~> m3 s-3 or W m-2] for
find_TKE_to_Kd, add_drag_diffusivity, calculate_tidal_mixing and
add_int_tide_diffusivity, with similar changes to the units of 21 diagnostics
or internal variables in the same routines and in add_LOTW_BBL_diffusivity and
add_MLrad_diffusivity.  Dozens of unit conversion factors were also cancelled
out with these changes, including using that GV%Z_to_H/GV%Rho_0 = GV%RZ_to_H and
that GV%Rho_0*GV%H_to_Z = GV%H_to_RZ for both Boussinesq or non-Boussinesq
configurations.

  Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are
bitwise identical in that mode, but in non-Boussinesq mode this conversion
involves multiplication and division by GV%Rho_0, so while all answers are
mathematically equivalent, this change does change answers at roundoff in
non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.
Copy link
Collaborator

@alperaltuntas alperaltuntas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CESM tests are now passing.

@marshallward
Copy link
Collaborator Author

@jiandewang I have finally managed to get access to WCOSS and have completed some run comparisons. I believe the cause is due to fused multiply-adds (FMAs) being enabled on WCOSS.

I discovered that reversing the order of an equation changed answers. This is the diff:

--- a/src/parameterizations/vertical/MOM_vert_friction.F90
+++ b/src/parameterizations/vertical/MOM_vert_friction.F90
@@ -2042,9 +2042,11 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i,
       ! These expressions assume that Kv_tot(i,nz+1) = CS%Kv, consistent with
       ! the suppression of turbulent mixing by the presence of a solid boundary.
       if (dhc < bbl_thick(i)) then
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (dhc+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((dhc+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       else
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (bbl_thick(i)+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((bbl_thick(i)+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       endif
     endif ; enddo
     do K=nz,2,-1 ; do i=is,ie ; if (do_i(i)) then

When I looked at the assembly, I noticed that it was using FMAs to evaluate the expression, which would explain the answer difference.

  113070     .loc    1  2045  is_stmt 1                                       |  113070     .loc    1  2045  is_stmt 1                                      
  113071         vaddsd    %xmm2, %xmm5, %xmm1                           #2045|  113071         vmulsd    %xmm3, %xmm4, %xmm1                           #204
  113073         vmulsd    %xmm9, %xmm1, %xmm2                           #2045|  113073         vaddsd    %xmm2, %xmm5, %xmm2                           #204
  113075         vfmadd231sd %xmm4, %xmm3, %xmm2                         #2045|  113075         vfmadd213sd %xmm1, %xmm9, %xmm2                         #204
  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #2045|  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #204
  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #2045|  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #204
  113081         jmp       ..B8.146      # Prob 100%                     #2045|  113081         jmp       ..B8.146      # Prob 100%                     #204 

Although your template does not turn them on, the Cray wrapper does this implicitly:

$ ftn -craype-verbose
ifort -march=core-avx2 -mtune=core-avx2 

The -march flags will allow FMAs if the hardware supports them. Since fma(a,b,c) is more accurate than a*b + c, it will change answers.

We introduce a lot of changes using dimensional constants, which could easily push the compiler to use or remove existing FMAs. Although most of the time these should not have any effect, there are plenty of situations where it can make a difference.

I repeated the runs with the -no-fma flag, and did not detect any answer changes, so this is almost certainly the problem. But it should also be tested in the UFS suite.


As for how to solve this, I can only see two options:

  • We could hunt for every single scenario where an FMA could have been affected, and try to replicate the old operation with parentheses.

  • EMC could modify its regression tests to disable FMAs. Unfortunately, this will also probably slow down the model.

GFDL does not use FMAs because of this problem, and also suffer from this performance loss.

Neither of these options is good. But I don't think we have a choice. We might need to discuss this at a future MOM meeting.

As for long term solutions, we may need to do more aggressive FMA/no-FMA testing before submitting PRs for review.

@awallcraft
Copy link
Collaborator

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

For HYCOM, we always use -fp-model precise -no-fma -ftz, because otherwise we don't get bit for bit reproducability on different numbers of MPI tasks (due to the loop prologue or epilogue after vectorization being different for different tile sizes).

@marshallward
Copy link
Collaborator Author

marshallward commented Feb 8, 2024

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

Whether or not the ftn wrapper include -march=core-avx2 will depend on the system. For the systems which have it, we set -march=core-avx-i which overrides -march=core-avx2 and disables FMAs. I looked through our output and did not see any FMA instructions.

(See: https://github.com/mom-ocean/mkmf/blob/3fe7c00f63cd6d07aa10a1c0a02069cc14067c85/templates/ncrc5-intel-classic.mk#L94-L103)

WCOSS was also using -fp-model source so it would seem that it does not inhibit FMAs.

@jiandewang
Copy link
Collaborator

yes in UFS we have fp-model but for wcoss2 we also have march=core-avx2
https://github.com/ufs-community/ufs-weather-model/blob/develop/cmake/Intel.cmake#L22-L28

@mathomp4
Copy link
Collaborator

mathomp4 commented Feb 8, 2024

I just took a look at GEOS and we are...confused, I guess. 😄

For GNU, we generally don't specify so I guess GNU can do what it wants. Well, other than some not-well-tested Aggressive build options where FMA was turned off per a suggestion from the mailing list.

For Intel, we do have it on, though it would be interesting to see if it matters. Our Intel flags are a melange of historical things. Essentially we run with:

-O3 -g -march=core-avx2 -fma -align array32byte -traceback -assume realloc_lhs 
-qopt-report0 -align all -fno-alias -fpe3 
-fp-model fast -fp-model source -fp-model consistent -ftz 
-assume noold_maxminloc -diag-disable 10121 -align dcommons -fPIC -qopenmp

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

@sanAkel
Copy link
Collaborator

sanAkel commented Feb 8, 2024

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

I'm not aware of the exact criteria for reproducibility between chips of MOM6. Does it include:

  1. Intel
  2. AMD
  3. Apple silicon
  4. ...?

@marshallward
Copy link
Collaborator Author

@mathomp4 with those flags, you are almost certainly using FMAs. I see nothing that would explicitly disable them. I can't speak to the degree of reproducibility, but that is a decision for GEOS.

@sanAkel I don't think we have good information on this topic. We saw answer changes from Intel to AMD, but this turned out to be related to different math libraries, not the chips.

I think we could say a lot more on the topic of reproducibility across hardware and compiler settings, but it might be too much a digression for a PR which has already languished for a very long time. We could start up a new Discussions topic?

@sanAkel
Copy link
Collaborator

sanAkel commented Feb 8, 2024

We could start up a new Discussions topic?

👍

@jiandewang
Copy link
Collaborator

jiandewang commented Feb 13, 2024

in UFS we just added regional coupled run (fv3+MOM6+WAVE) half week ago. Testing from HAFS group (@binli2337) show results changes. John Stephen (@JohnSteffen-NOAA) also reported answer chages in his MOM6 standalone tests. Coupled runs are done on HERA, Bin Li is testing on wcoss2 now . Standalone test is done on ORION (which is similar to HERA). I have put the run log files on GAEA /gpfs/f5/nggps_emc/scratch/Jiande.Wang/For-Marshall/HAFS. Diff on the run log (with DEBUG on) shows difference starts from MOM_initialize_state. Both coupled and standalone runs show similar diff results.

Copy link
Collaborator

@abozec abozec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COAPS approves.

@jiandewang
Copy link
Collaborator

@marshallward clue for you from some testing for the OBC problem on my side:
H3.3c 3650339 New treatment of ice shelf boundaries (#467) ! equal to H3.3b
H3.3b e2deaec *Patches for nonBous_OBCs to prevent blocking !answer changed!
H3.3a1 06bc001 +Add segment%dZtot ! job hanging forever
H3.3a 54b46f6 +Non-Boussinesq Flather open boundary conditions !same answer as the 1st commit of this PR

here Hxxx is my testing ID.

  • H3.3a has the same answer as the first commit of this PR
  • H3.3b has different answer as H3.3a
  • H3.3c has the same answer as H3.3b
  • H3.3a1 job hanged and timed out

so the problem is likely happened at H3.3b

@marshallward
Copy link
Collaborator Author

@jiandewang Thanks for identifying the relevant commits, it helped a lot.

I went over them with @MJHarrison-GFDL and it looks like H3.3b was a bugfix to H3.3a1, which fixed the hanging bug. It also seems plausible that T and S would be updated differently, causing the answer change. It looks like a bug fix to us (which @Hallberg-NOAA can confirm when he gets back).

I don't think anyone should expect the same degree of reproducibility from OBC-based experiments. Although many people have started using OBCs in their experiments, it should be considered a feature under development. If things have reached the point where known bugs need to be preserved, then I think we need to have that conversation very soon.

@jiandewang
Copy link
Collaborator

@marshallward thanks for the explanation. A "keep-bug" flag is not really needed as long as we understand the reason for the answer change. Let's wait for the confirmation from Bob and I will explain the fact to our regional group people

@sanAkel
Copy link
Collaborator

sanAkel commented Feb 14, 2024

I don't think anyone should expect the same degree of reproducibility from OBC-based experiments. Although many people have started using OBCs in their experiments, it should be considered a feature under development. If things have reached the point where known bugs need to be preserved, then I think we need to have that conversation very soon.

We do not use OBCs, if that's all were to change, there is no need for me to re-run my tests.

This patch modifies select calculations of PR#1616 in order to preserve bit
reproducibility when FMA optimization is enabled.  We add parentheses and
reorder terms in selected expressions which either direct or suppress FMAs,
ensuring equivalence with the previous release.

We address two specific equations in the PR.  The first is associated
with vertical friction coupling coupling coefficient.  The diff is shown
below.

-  a_cpl(i,K) = Kv_tot(i,K) / (h_shear*GV%H_to_Z + I_amax*Kv_tot(i,K))
+  a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K))

The denominator is of the form `a*b + c*d`.  A compiler may favor an FMA
of the form `a*b + (c*d)`.  However, the modified equation is of form
which favors the `a + c*d` FMA.  Each form gives different results in
the final bits.

We resolve this by expliciting wrapping the RHS in parentheses:

  a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax*Kv_tot(i,K)))

Although this disables the FMA, it produces the same bit-equivalent
answer as the original expression.

----

The second equation for TKE due to kappa shear is shown below.

-  tke_src = dz_Int(K) *(((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - &
-                              (TKE(k) - q0)*TKE_decay(k)) - &
+  tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - &
+                              (TKE(k) - q0)*TKE_decay(k)) - &
    ...

The outer equation was of the form `b + c` but is promoted to `a*b + c`,
transforming it to an FMA.

We resolve this by suppressing this FMA optimization:

  tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - &
                         (TKE(k) - q0)*TKE_decay(k)) - &
        ...

----

The following two changes are intended to be the smallest modification
which preserves answers for known testing on target compilers.  It does
not encompass all equation changes in this PR.  If needed, we could
extend these changes to similar modifications of PR#1616.

We do not expect to support bit reproducibility when FMAs are enabled.
But this is an ongoing conversation, and the rules around FMAs should be
expected to change as we learn more and agree on rules of
reproducibility.
@marshallward
Copy link
Collaborator Author

As discussed at the MOM6 dev call, we have identified two changes which caused regressions in EMC testing, and the latest patch fixes these regressions on both WCOSS and Gaea.

In all likelihood, this is also the source of regressions in past PRs from GFDL and NCAR, and it may be worthwhile to resubmit these changes.

Although this PR may contain other potential regressions in FMA-enabled runs, there was a general consensus to accept these fixes and move forward with the PR.

To other partners: I won't reset the approvals, but let us know if you have any objections to these changes. (They should not change answers, but it can't hurt to verify.)

Copy link
Collaborator

@jiandewang jiandewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally we reached this stage. Thanks for @marshallward and all others work.

@marshallward marshallward merged commit 2ab885e into mom-ocean:main Feb 28, 2024
12 checks passed
@marshallward marshallward deleted the dev-gfdl-main-candidate-2023-11-13 branch February 28, 2024 15:59
@marshallward
Copy link
Collaborator Author

Thanks @jiandewang for helping to sort out a longstanding mystery, and to the others for their patience.

Now for a short break before the next PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.