GFDL to main, 2023-11-13 #1616

marshallward · 2023-11-13T21:55:47Z

This PR contains many new features, bugfixes, and overall changes to the MOM6 codebase. In particular, the model now includes a full non-Boussinesq solver alongside the existing Boussinesq and "semi-non-Boussinesq" modes. Other major features include updates to the ice shelf model, improved particle tracer support, improvements to the computational performance of the Zanna-Bolton parameterization, fixes to the internal tide energetics, brine plume mixing, and self-attraction and loading.

It also includes the usual bundle of bug fixes and code refactors.

Non-Boussinesq

Ice Shelf

Particle Tracers

ZB2020

Acceleration of Zanna-Bolton-2020 parameterization and new features required for NW2 NOAA-GFDL/MOM6#484: (@Pperezhogin)

Internal Tide

Brine Plume Mixing

Self-attraction and Loading

Separate self-attraction and loading as a new module NOAA-GFDL/MOM6#382: (@herrwang0)

Additional Features

Bugfix

Refactor

Testing

Build

Misc

Contributors

The autoconf Python interpreter search was slightly modified to search for Python even if $PYTHON is set to an empty string. This is done by unsetting PYTHON if it is set but empty, then following the usual macro. This was required since `export PYTHON` in a Makefile will create the `PYTHON` variable but will assign it no value (i.e. empty string). This causes issues in some build environments. The backup `configure~` script was also added to the developer `ac-clean` cleanup rule.

Refactored 6 files in the ALE directory to calculate the nominal depth in thickness units in a single place (it is done in regridding_main now) and pass it to the various places where it is used, in a preparatory step to modify how this calculation is done in non-Boussinesq mode. There are new arguments to several publicly visible routines, including: - Add non_depth_H arguments to hybgen_regrid, build_zstar_grid, build_sigma_grid, build_rho_grid, build_grid_HyCOM1, build_grid_adaptive, build_adapt_column and build_grid_arbitrary - Add optional zScale arguments to build_zstar_grid and build_grid_HyCOM1 - Add unit_scale_type arguments to regridding_main, ALE_regrid_accelerated and ALE_offline_inputs Also eliminated an incorrect rescaling GV%Z_to_H facto when calculating the total column thickness from the layer thicknesses when an ice shelf is used with a Hycom grid. This would have caused dimensional consistency testing to fail. Added the new runtime parameters HYBGEN_H_THIN, HYBGEN_FAR_FROM_SURFACE HYBGEN_FAR_FROM_BOTTOM, and HYBGEN_DENSITY_EPSILON to set previously hard-coded dimensional parameters used in the Hybgen regridding code and store these values in new variables in hybgen_regrid_CS. Two of these are no longer passed to hybgen_column_regrid as separate parameters. By default these new runtime parameters recover the previous hard-coded values. Also eliminated an unused block of code in build_rho_column. Several comments documenting variables or their units were also added. All answers are bitwise identical, but there are 4 new runtime parameters that would appear in some MOM_parameter_doc files and there are changes to the arguments to 11 routines.

As described in issue #372, I would like to be able to create restart files that contain information about the particle location. These files will be written at the same time as other restart files. I cannot add these calls directly to the driver, because the driver does not have information about the particle location. We have added save_MOM6_internal_state as a subroutine in MOM.F90, and we added calls to this subroutine from each of the drivers. We hope this will allow for more new packages to write restart files in the future. Co-authored by Spencer Jones <spencerjones@tamu.edu>

Added the integer valid_SpV_halo to the thermo_var_ptrs type to indicate whether the SpV_array has been updated and its valid halo size, to facilitate error detection and debugging in non-Boussinesq mode. Tv%valid_SpV_halo is set to the halo size in calc_derived_thermo or after a halo update is done to tv%SpV_avg, and it is set to a negative value right after calls that change temperatures and salinities (such as by ALE remapping) unless there is a call to calc_derived_thermo. Tests for the validity of tv%SpV_avg are added to the routines behind thickness_to_dz, with fatal errors issued if invalid arrays would be used, but more tests could perhaps be used in any parameterization routines where tv%SpV_avg is used directly. Handling the updates to tv%SpV_avg this way helps to avoid unnecessary calls to calc_derived_thermo, which in turn has equation of state calls that can be expensive, while also providing essential verification of new code related to the non-Boussinesq code. These tests can probably be commented out or removed for efficiency once there is a full suite of regression tests for the fully non-Boussinesq mode of MOM6. In addition, a new optional debug argument was added to calc_derived_thermo which can be used to triggers checksums for the variables used to calculate tv%SpV_avg. One call to calc_derived_thermo was also added just before the initialization call to ALE_regrid that will be needed with the next commit, but does not change answers yet. All answers are bitwise identical, but there is a new element in a transparent and widely used type and a new optional argument to a public interface.

Use RHO_KV_CONVERT instead of RHO_0 to set the non-Boussinesq version of GV%m_to_H, so that there is a mechanism for testing the independence of the fully non-Boussinesq mode from the Boussinesq reference density. With this change, GV%Z_to_H is not guaranteed to be equal to (GV%Z_to_m*GV%m_to_H), with the latter expression preferred when setting parameters. By default the two parameters are the same, and they will probably only ever differ in testing the code. All Boussinesq solutions are bitwise identical, but there are differences in the description of RHO_KV_CONVERT that will appear in MOM_parameter_doc files.

Add new arguments to 7 routines that will be needed for the non-Boussinesq capability, but do not use them yet, so that there will be fewer cross file dependencies as the various changes are being reviewed simultaneously. The impacted interfaces are MEKE_int, vertvisc_coef, sumSWoverBands, KPP_calculate, differential_diffuse_T_S, set_BBL_TKE, and apply_sponge In the three step_MOM_dyn_... routines and in calculateBuoyancyFlux1d, this change includes calls to thickness_to_dz to calculate the new vertical distance arrays that will be passed into vertvisc_coef or sumSWoverBands. The only place where the new arguments are actually used is in sumSWoverBands and set_opacity where the changes are particularly simple. All answers are bitwise identical, but there are new non-optional arguments to seven publicly visible routines.

* Restore functionality for reading slices from 3d volumes in MOM_io - The recent MOM_io modifications in support of FMS2_io accidentally removed support for reading on-grid data (same horizontal grid as model) k-slices. This is needed in some configurations in the model state initialization. * Add FMS1 interfaces * Additional patches to enable reading ongrid state initialization data - read local 3d volume rather than attempting to slice ongrid data vertically. - Related bugfixes in MOM_io

- We were reading KV_ML_INVZ2 without logging, then checking for KVML and finally logging based on a combination of the two. This had the side affect that we get warnings about not using KVML even if KVML was not present. - The fix checks for KVML first, and then changes the default so that when KVML=1e-4 is replaced by KV_ML_INVZ2=1e-4 we end up with no warnings and KVML can be obsoleted safely. Note: this commit alone does not remove all warnings from the MOM6-examples suite because we still need to fix the MOM_input that still use KVML - KVML needs to be unscaled since it is the default for KV_ML_INVZ2 - tc3 used KVML and has been corrected.

Use the new runtime parameter RHO_PGF_REF instead of RHO_0 to set the reference density that is subtracted off from the other densities when calculating the finite volume pressure gradient forces. Although the answers are mathematically equivalent for any value of this parameter, a judicious choice can reduce the impacts of roundoff errors by about 2 orders of magnitude. By default, RHO_PGF_REF is set to RHO_0, and all answers are bitwise identical. However, there is a new runtime parameter that appears in many of the MOM_parameter_doc.all files.

The message that a file is being created was issued as a WARNING when we all agree it should really be a NOTE. Depth_list.nc is read if it is present to avoid recomputing a sorted list, but the absence of the file is not an error and does not warrant a warning. Changes: - Changed WARNING to NOTE. - Removed MOM_mesg from imports since it wasn't being used.

The interpolation scheme for state-dependent diagnostic coordinates was incorrectly registering as the same parameter as the main model. This meant it was never possible to change the interpolation scheme from the default (which was not the same as the main model). Fix registers the generated parameter name which was always computed but not used. A typical example of the generated parameter is "DIAG_COORD_INTERP_SCHEME_RHO2".

Fixed a bug in which wave_speed_init was effectively discarding any values of mono_N2_depth passed to it via the optional argument mono_N2_depth, but also changed the default value of RESOLN_N2_FILTER_DEPTH, which was previously being discarded, to disable the monotonization and replicate the previous results. There were also clarifying additions made to the description how to disable RESOLN_N2_FILTER_DEPTH. This will change some entries in MOM_parameter_doc files, and it will change solutions in cases that set RESOLN_N2_FILTER_DEPTH to a non-default value and have parameter settings that use the resolution function to scale their horizontal mixing. There are, however, no known active simulations where the answers are expected to change.

Revised the calculation of gprime and the coordinate densities (GV%Rlay) in fully non-Boussinesq mode to use the arithmetic mean of adjacent coordinate densities in the denominator of the expression for g_prime in place of RHO_0. Also use LIGHTEST_DENSITY in place of RHO_0 to specify the top-level coordinate density in certain coordinate modes. Also made corresponding changes to the fully non-Boussinesq APE calculation when CALCULATE_APE is true, and eliminated an incorrect calculation of the layer volumes in non-Boussinesq mode using the Boussinesq reference density that was never actually being used when CALCULATE_APE is false. This commit will change answers in some fully non-Boussinesq calculations, and an existing runtime parameter is used and logged in some new cases, changing the MOM_parameter_doc file in those cases.

Refactored thickness_diffuse when in non-Boussinesq mode to avoid any dependencies on the Boussinesq reference density, and to translate the volume streamfunction into the mass streamfunction using an appropriately defined in-situ density averaged to the interfaces at velocity points. This form follows the suggestions of Appendix A.3.2 of Griffies and Greatbatch (Ocean Modelling, 2012) when in non-Boussinesq mode. Thickness_diffuse_full was also revised to work properly in non-Boussinesq mode (and not depend on the Boussinesq reference density) when no equation of state is used. As a part of these changes, the code now uses thickness-based streamfunctions and other thickness-based internal calculations in MOM_thickness_diffuse. For example, the overturning streamfunctions with this change are now in m3/s in Boussinesq mode, but kg/s in non-Boussinesq mode. These changes use a call to thickness_to_dz to set up a separate variable with the vertical distance across layers, and in non-Boussinesq mode they use tv%SpV_avg to estimate in situ densities. Additional debugging checksums were added to thickness_diffuse. The code changes are extensive with 15 new or renamed internal variables, and changes to the units of 9 other internal variables and 3 arguments to the private routine streamfn_solver. After this change, GV%Rho, GV%Z_to_H and GV%H_to_Z are no longer used in any non-Boussinesq calculations (12 such instances having been elimated). Because some calculations have to be redone with the separate thickness and dz variables, this will be more expensive than the original version. No public interfaces are changed, and all answers are bitwise identical in Boussinesq or semiBoussinesq mode, but they will change in non-Boussinesq mode when the isopycnal height diffusion parameterization is used.

* Salt data structures * First steps at brine plume: pass info from SIS2 * The brine plume parameterization, - including now passing the dimensional scaling tests. * Fix problem when running Tidal_bay case with gnu. * Avoiding visc_rem issues inside land mask. Tweaking the brine plume code. * Using the proper MLD in the brine plumes - it now works better on restart * Always including MLD in call to applyBoundary... - I could move it up and make it not optional. * Adding some OpenMP directives to brine plumes

This commit brings the drifters interface up-to-date with the current version of the drifters package, which requires h (layer thickness) to calculate the vertical movement of particles. The interfaces in the code and in config_src/external are updated to pass this information to the drifters package.

Pass dt_kappa_smooth to calc_isoneutral_slopes and vert_fill_TS in units of [H Z ~> m2 or kg m-1] instead of [Z2 ~> m2] for consistency with the units of other diffusivities in the code and to reduce the depenency on the Boussinesq reference density in non-Boussinesq configurations. In addition to the changes to the units of these two arguments, there is a new unit_scale_type argument to vert_fill_TS and MOM_calc_varT and a new verticalGrid_type argument to MOM_stoch_eos_init. The units of 4 vertical diffusivities in the control structures in 4 different modules are also changed accordingly. All answers are bitwise identical in Boussinesq mode, but they can change for some non-Boussinesq configurations. There are new mandatory arguments to three publicly visible routines.

Added a comment justifying the use of a fixed rescaling factor for the diffusivity used in vert_fill_TS. All answers and output are identical.

Added the new public interface find_ustar to extract the friction velocity from either a forcing type argument, or a mech_forcing_type argument, either directly or from tau_mag, and in non-Boussinesq mode by using the time-evolving surface specific volume. Find_ustar is an overloaded interface to find_ustar_fluxes or find_ustar_mech_forcing, which are the same but for the type of one of their arguments. For now, the subroutines bulkmixedlayer, mixedlayer_restrajt_OM4, mixedlayer_restrat_Bodner and mixedlayer_restrat_BML are calling find_ustar to avoid code duplication during the transition to work in fully non-Boussinesq mode, but it will eventually be used in about another half dozen other places. All Boussinesq answers are bitwise identical, but non-Boussinesq answers will change and become less dependent on the Boussinesq reference density, and there is a new publicly visible interface wrapping two subroutines.

Changed the units of the optional mono_N2_depth argument to wave_speed, wave_speed_init and wave_speed_set_param in thickness units instead of height units. Accordingly, the units of one element each in the diagnostics_CS and wave_speed_CS and a local variable in VarMix_init are also changed to thickness units. The unit descriptions of some comments describing diagnostics were also amended to also describe the non-Boussinesq versions. Because this is essentially just changing when the unit conversion occurs, all answers are bitwise identical, but there are changes to the units of an optional argument in 3 publicly visible routines.

Added the new runtime parameter BT_RHO_LINEARIZED to specify the density that is used to convert total water column thicknesses into mass in non-Boussinesq mode with linearized options in the barotropic solver or when estimating the stable barotropic timestep without access to the full baroclinic model state. The default is set to RHO_0 and answers do not change by default. This new parameter is used in non-Boussinesq mode with some options in btcalc and find_face_areas, when LINEARIZED_BT_CORIOLIS = True or BT_NONLIN_STRESS = False, and in the unit conversion of the ice strength with dynamic pressure. Also cancelled out factors of GV%Z_to_H in MOM_barotropic.F90 to simplify the code and reduce the dependence on the value of GV%Rho_0 in non-Boussinesq mode. This involved changing the units of 4 variables in the barotropic_CS type, 3 internal variables in btstep and an internal variable in barotropic_init to use thickness units. The rescaled internal variable mass_to_Z was also replaced with the equivalent GV%RZ_to_H. There are also 4 new debugging messages. Also modified the units of the gtot_est argument to match those of pbce. There is a new element in barotropic_CS. Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are bitwise identical in that mode, but in non-Boussinesq mode this conversion involves multiplication and division by GV%Rho_0, so while all answers are mathematically equivalent, this change does change answers at roundoff in non-Boussinesq mode. Additionally there is a new runtime parameter that will appear in some MOM_parameter_doc files.

Use thickness_to_dz to convert layer thicknesses to depths in 4 tracer modules (DOME_tracer, dye_example, ideal_age_example and nw2_tracers) so that this conversion is done correctly in non-Boussineq mode, and there is no longer any dependency on the Boussinesq reference density in that mode. This change includes the addition of a thermo_var_ptrs argument to 5 routines (initialize_DOME_tracer, initialize_dye_tracer, dye_tracer_column_physics ideal_age_tracer_column_physics and count_BL_layers) and changes to the units of some internal variables, and the addition of 6 new 2-d or 3-d arrays with the vertical distance across layers. An unused param_file_type argument to initialize_DOME_tracer was also eliminated. Comments were also added to describe the units of 5 of the variables in the ideal age tracer control structure and 7 internal variables in that same module, and there was some minor cleanup of the formatting cf calls in tracer_flow_control_init. There was some minor refactoring in the ns2_tracers module to use SZK_(GV) instead of SZK_(G) to declare the vertical extent of some arrays, and the vertical indexing convention for interfaces in nw2_tracer_dist was revised from starting at 0 to start at 1 for consistency with all the other code in MOM6. Also moved the code to do halo updates for the physical model state variables and call calc_derived_thermo before calling tracer_flow_control_init, because some routines there are now using the layer average specific volume to convert between thicknesses and heights when in non-Boussinesq mode. All answers in Boussinesq mode are bitwise identical, but these passive tracer modules have slightly different answers in non-Boussinesq mode. There are changes to the non-optional arguments to 4 public interfaces.

Changed a recently added OMP directive for plume_flux from private to firstprivate to reflect how this variable is actually used. This bug was introduced with PR #401, but was causing sporadic failures in some of our pipeline tests with the intel compiler (essentially due to initialized memory when openMP is used) for subsequent commits.

This patch merges the internal `save_restart` function with the new `save_MOM6_internal_state` function into a new general MOM restart function. It also makes an effort to eliminate `MOM_restart` as a driver dependency, narrowing the required MOM API for existing and future drivers. Also removes the `restart_CSp` argument from `MOM_wave_interface_init`, since it appeared to be used for nothing.

MOM simulations typically abort of the restart directory (usually RESTART) are absent. This patch adds POSIX support for mkdir() and creates the directory if it is missing.

Using inquire() to check for directory existence is not possible, since at least one compiler (Intel) does not consider directories to be files. The inquire call is replaced with a C interface to the POSIX stat() function. We do not fully emulate the behavior of stat, but we use its return value to determine existence of directories. This provides a more reliable method for identifying the existence of the directory. This should resolve many of the observed problems with RESTART creation in coupled runs.

Cancelled out factors of GV%Z_to_H in MOM_hor_visc.F90 to simplify the code and reduce the dependence on the value of GV%Rho_0 in non-Boussinesq mode. This involved changing the units of 3 internal variables in horizontal_viscosity and one element in the hor_visc_CS type to use thickness units or their inverse. Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are bitwise identical in that mode, but in non-Boussinesq mode this conversion involves multiplication and division by GV%Rho_0, so while all answers are mathematically equivalent, this change does change answers at roundoff in non-Boussinesq mode.

Revised the units of 12 vertvisc_type elements to be based on thicknesses, so that vertical viscosities (in [H Z T-1 ~> m2 s-1 or Pa s]) are stored as dynamic viscosites when in non-Boussinesq mode, with analogous changes to the diapycanl diffusivity (now in [H Z T-1 ~> m2 s-1 or kg m-1 s-1]). Similarly changed the units of the 2 Rayleigh drag velocity elements (Ray_u and Ray_v) of the vertvisc_type from vertical velocity units to thickness flux units and to more accurately reflect the nature of these fields. The bottom boundary layer TKE source element (TKE_BBL) was also revised to [H Z2 T-3 ~> m3 s-3 or W m-2]. This commit also adds required changes to the units of the viscosities or shear-driven diffusivities returned from KPP_calculate, calculate_CVMix_shear, calculate_CVMix_conv, Calculate_kappa_shear, Calc_kappa_shear_vertex, calculate_tidal_mixing and calculate_CVMix_tidal. Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are bitwise identical in that mode, but in non-Boussinesq mode this conversion involves multiplication and division by GV%Rho_0, so while all answers are mathematically equivalent, this change does change answers at roundoff in non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.

Rescaled diapycnal diffusivities passed as arguments in non-Boussinesq mode, to be equivalent to the thermal conductivity divided by the heat capacity, analogously to the difference between a kinematic viscosity and a dynamic viscosity, so that the new units are [H Z T-1 ~> m2 s-1 or kg m-1 s-1]. This includes changing the units of 4 arguments to set_diffusivity; 3 arguments each to calculate_bkgnd_mixing, add_drag_diffusivity, add_LOTW_BBL_diffusivity, user_change_diff, calculate_tidal_mixing and add_int_tide_diffusivity; 2 arguments to KPP_calculate, calculate_CVMix_conv, compute_ddiff_coeffs, differential_diffuse_T_S, entrainment_diffusive, double_diffusion, add_MLrad_diffusivity, and calculate_CVMix_tidal; and one argument to energetic_PBL. The units of 36 internal variables were also changed, as were a total of 29 elements in various opaque types, including 8 elements in bkgnd_mixing_cs, 2 in diabatic_CC, 3 in tidal_mixing_diags type, 1 in user_change_diff_CS, 9 in set_diffusivity_CS type, and 6 elements in diffusivity_diags. Two new internal variables were added, and several redundant GV%H_to_Z conversion factors were also cancelled out, some using that GV%H_to_Z*GV%Rho0 = GV%H_to_RZ. Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are bitwise identical in that mode, but in non-Boussinesq mode this conversion involves multiplication and division by GV%Rho_0, so while all answers are mathematically equivalent, this change does change answers at roundoff in non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.

Changed the units for TKE arguments to [H Z2 T-3 ~> m3 s-3 or W m-2] for find_TKE_to_Kd, add_drag_diffusivity, calculate_tidal_mixing and add_int_tide_diffusivity, with similar changes to the units of 21 diagnostics or internal variables in the same routines and in add_LOTW_BBL_diffusivity and add_MLrad_diffusivity. Dozens of unit conversion factors were also cancelled out with these changes, including using that GV%Z_to_H/GV%Rho_0 = GV%RZ_to_H and that GV%Rho_0*GV%H_to_Z = GV%H_to_RZ for both Boussinesq or non-Boussinesq configurations. Because GV%Z_to_H is an exact power of 2 in Boussinesq mode, all answers are bitwise identical in that mode, but in non-Boussinesq mode this conversion involves multiplication and division by GV%Rho_0, so while all answers are mathematically equivalent, this change does change answers at roundoff in non-Boussinesq mode unless GV%Rho_0 is chosen to be an integer power of 2.

alperaltuntas

CESM tests are now passing.

marshallward · 2024-02-07T23:57:19Z

@jiandewang I have finally managed to get access to WCOSS and have completed some run comparisons. I believe the cause is due to fused multiply-adds (FMAs) being enabled on WCOSS.

I discovered that reversing the order of an equation changed answers. This is the diff:

--- a/src/parameterizations/vertical/MOM_vert_friction.F90
+++ b/src/parameterizations/vertical/MOM_vert_friction.F90
@@ -2042,9 +2042,11 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i,
       ! These expressions assume that Kv_tot(i,nz+1) = CS%Kv, consistent with
       ! the suppression of turbulent mixing by the presence of a solid boundary.
       if (dhc < bbl_thick(i)) then
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (dhc+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((dhc+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       else
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (bbl_thick(i)+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((bbl_thick(i)+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       endif
     endif ; enddo
     do K=nz,2,-1 ; do i=is,ie ; if (do_i(i)) then

When I looked at the assembly, I noticed that it was using FMAs to evaluate the expression, which would explain the answer difference.

  113070     .loc    1  2045  is_stmt 1                                       |  113070     .loc    1  2045  is_stmt 1                                      
  113071         vaddsd    %xmm2, %xmm5, %xmm1                           #2045|  113071         vmulsd    %xmm3, %xmm4, %xmm1                           #204
  113073         vmulsd    %xmm9, %xmm1, %xmm2                           #2045|  113073         vaddsd    %xmm2, %xmm5, %xmm2                           #204
  113075         vfmadd231sd %xmm4, %xmm3, %xmm2                         #2045|  113075         vfmadd213sd %xmm1, %xmm9, %xmm2                         #204
  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #2045|  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #204
  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #2045|  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #204
  113081         jmp       ..B8.146      # Prob 100%                     #2045|  113081         jmp       ..B8.146      # Prob 100%                     #204

Although your template does not turn them on, the Cray wrapper does this implicitly:

$ ftn -craype-verbose
ifort -march=core-avx2 -mtune=core-avx2

The -march flags will allow FMAs if the hardware supports them. Since fma(a,b,c) is more accurate than a*b + c, it will change answers.

We introduce a lot of changes using dimensional constants, which could easily push the compiler to use or remove existing FMAs. Although most of the time these should not have any effect, there are plenty of situations where it can make a difference.

I repeated the runs with the -no-fma flag, and did not detect any answer changes, so this is almost certainly the problem. But it should also be tested in the UFS suite.

As for how to solve this, I can only see two options:

We could hunt for every single scenario where an FMA could have been affected, and try to replicate the old operation with parentheses.
EMC could modify its regression tests to disable FMAs. Unfortunately, this will also probably slow down the model.

GFDL does not use FMAs because of this problem, and also suffer from this performance loss.

Neither of these options is good. But I don't think we have a choice. We might need to discuss this at a future MOM meeting.

As for long term solutions, we may need to do more aggressive FMA/no-FMA testing before submitting PRs for review.

awallcraft · 2024-02-08T15:10:45Z

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

For HYCOM, we always use -fp-model precise -no-fma -ftz, because otherwise we don't get bit for bit reproducability on different numbers of MPI tasks (due to the loop prologue or epilogue after vectorization being different for different tile sizes).

marshallward · 2024-02-08T15:28:31Z

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

Whether or not the ftn wrapper include -march=core-avx2 will depend on the system. For the systems which have it, we set -march=core-avx-i which overrides -march=core-avx2 and disables FMAs. I looked through our output and did not see any FMA instructions.

(See: https://github.com/mom-ocean/mkmf/blob/3fe7c00f63cd6d07aa10a1c0a02069cc14067c85/templates/ncrc5-intel-classic.mk#L94-L103)

WCOSS was also using -fp-model source so it would seem that it does not inhibit FMAs.

jiandewang · 2024-02-08T16:12:51Z

yes in UFS we have fp-model but for wcoss2 we also have march=core-avx2
https://github.com/ufs-community/ufs-weather-model/blob/develop/cmake/Intel.cmake#L22-L28

mathomp4 · 2024-02-08T19:07:04Z

I just took a look at GEOS and we are...confused, I guess. 😄

For GNU, we generally don't specify so I guess GNU can do what it wants. Well, other than some not-well-tested Aggressive build options where FMA was turned off per a suggestion from the mailing list.

For Intel, we do have it on, though it would be interesting to see if it matters. Our Intel flags are a melange of historical things. Essentially we run with:

-O3 -g -march=core-avx2 -fma -align array32byte -traceback -assume realloc_lhs 
-qopt-report0 -align all -fno-alias -fpe3 
-fp-model fast -fp-model source -fp-model consistent -ftz 
-assume noold_maxminloc -diag-disable 10121 -align dcommons -fPIC -qopenmp

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

sanAkel · 2024-02-08T19:25:13Z

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

I'm not aware of the exact criteria for reproducibility between chips of MOM6. Does it include:

Intel
AMD
Apple silicon
...?

marshallward · 2024-02-08T19:29:54Z

@mathomp4 with those flags, you are almost certainly using FMAs. I see nothing that would explicitly disable them. I can't speak to the degree of reproducibility, but that is a decision for GEOS.

@sanAkel I don't think we have good information on this topic. We saw answer changes from Intel to AMD, but this turned out to be related to different math libraries, not the chips.

I think we could say a lot more on the topic of reproducibility across hardware and compiler settings, but it might be too much a digression for a PR which has already languished for a very long time. We could start up a new Discussions topic?

sanAkel · 2024-02-08T20:13:27Z

We could start up a new Discussions topic?

👍

jiandewang · 2024-02-13T19:35:48Z

in UFS we just added regional coupled run (fv3+MOM6+WAVE) half week ago. Testing from HAFS group (@binli2337) show results changes. John Stephen (@JohnSteffen-NOAA) also reported answer chages in his MOM6 standalone tests. Coupled runs are done on HERA, Bin Li is testing on wcoss2 now . Standalone test is done on ORION (which is similar to HERA). I have put the run log files on GAEA /gpfs/f5/nggps_emc/scratch/Jiande.Wang/For-Marshall/HAFS. Diff on the run log (with DEBUG on) shows difference starts from MOM_initialize_state. Both coupled and standalone runs show similar diff results.

abozec

COAPS approves.

jiandewang · 2024-02-14T15:07:33Z

@marshallward clue for you from some testing for the OBC problem on my side:
H3.3c 3650339 New treatment of ice shelf boundaries (#467) ! equal to H3.3b
H3.3b e2deaec *Patches for nonBous_OBCs to prevent blocking !answer changed!
H3.3a1 06bc001 +Add segment%dZtot ! job hanging forever
H3.3a 54b46f6 +Non-Boussinesq Flather open boundary conditions !same answer as the 1st commit of this PR

here Hxxx is my testing ID.

H3.3a has the same answer as the first commit of this PR
H3.3b has different answer as H3.3a
H3.3c has the same answer as H3.3b
H3.3a1 job hanged and timed out

so the problem is likely happened at H3.3b

marshallward · 2024-02-14T17:40:04Z

@jiandewang Thanks for identifying the relevant commits, it helped a lot.

I went over them with @MJHarrison-GFDL and it looks like H3.3b was a bugfix to H3.3a1, which fixed the hanging bug. It also seems plausible that T and S would be updated differently, causing the answer change. It looks like a bug fix to us (which @Hallberg-NOAA can confirm when he gets back).

I don't think anyone should expect the same degree of reproducibility from OBC-based experiments. Although many people have started using OBCs in their experiments, it should be considered a feature under development. If things have reached the point where known bugs need to be preserved, then I think we need to have that conversation very soon.

jiandewang · 2024-02-14T18:34:55Z

@marshallward thanks for the explanation. A "keep-bug" flag is not really needed as long as we understand the reason for the answer change. Let's wait for the confirmation from Bob and I will explain the fact to our regional group people

sanAkel · 2024-02-14T22:15:42Z

I don't think anyone should expect the same degree of reproducibility from OBC-based experiments. Although many people have started using OBCs in their experiments, it should be considered a feature under development. If things have reached the point where known bugs need to be preserved, then I think we need to have that conversation very soon.

We do not use OBCs, if that's all were to change, there is no need for me to re-run my tests.

This patch modifies select calculations of PR#1616 in order to preserve bit reproducibility when FMA optimization is enabled. We add parentheses and reorder terms in selected expressions which either direct or suppress FMAs, ensuring equivalence with the previous release. We address two specific equations in the PR. The first is associated with vertical friction coupling coupling coefficient. The diff is shown below. - a_cpl(i,K) = Kv_tot(i,K) / (h_shear*GV%H_to_Z + I_amax*Kv_tot(i,K)) + a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K)) The denominator is of the form `a*b + c*d`. A compiler may favor an FMA of the form `a*b + (c*d)`. However, the modified equation is of form which favors the `a + c*d` FMA. Each form gives different results in the final bits. We resolve this by expliciting wrapping the RHS in parentheses: a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax*Kv_tot(i,K))) Although this disables the FMA, it produces the same bit-equivalent answer as the original expression. ---- The second equation for TKE due to kappa shear is shown below. - tke_src = dz_Int(K) *(((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & - (TKE(k) - q0)*TKE_decay(k)) - & + tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & + (TKE(k) - q0)*TKE_decay(k)) - & ... The outer equation was of the form `b + c` but is promoted to `a*b + c`, transforming it to an FMA. We resolve this by suppressing this FMA optimization: tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - & (TKE(k) - q0)*TKE_decay(k)) - & ... ---- The following two changes are intended to be the smallest modification which preserves answers for known testing on target compilers. It does not encompass all equation changes in this PR. If needed, we could extend these changes to similar modifications of PR#1616. We do not expect to support bit reproducibility when FMAs are enabled. But this is an ongoing conversation, and the rules around FMAs should be expected to change as we learn more and agree on rules of reproducibility.

marshallward · 2024-02-27T16:54:22Z

As discussed at the MOM6 dev call, we have identified two changes which caused regressions in EMC testing, and the latest patch fixes these regressions on both WCOSS and Gaea.

In all likelihood, this is also the source of regressions in past PRs from GFDL and NCAR, and it may be worthwhile to resubmit these changes.

Although this PR may contain other potential regressions in FMA-enabled runs, there was a general consensus to accept these fixes and move forward with the PR.

To other partners: I won't reset the approvals, but let us know if you have any objections to these changes. (They should not change answers, but it can't hurt to verify.)

jiandewang

finally we reached this stage. Thanks for @marshallward and all others work.

marshallward · 2024-02-28T15:59:32Z

Thanks @jiandewang for helping to sort out a longstanding mystery, and to the others for their patience.

Now for a short break before the next PR!

marshallward and others added 30 commits July 7, 2023 10:48

Add comment justifying rescaling in vert_fill_TS

e465b1f

Added a comment justifying the use of a fixed rescaling factor for the diffusivity used in vert_fill_TS. All answers and output are identical.

Create restart directory if absent

3d9190c

MOM simulations typically abort of the restart directory (usually RESTART) are absent. This patch adds POSIX support for mkdir() and creates the directory if it is missing.

raphaeldussin mentioned this pull request Dec 4, 2023

change target to pointer and check for association NOAA-GFDL/MOM6#530

Merged

change target to pointer and check for association

40134ed

alperaltuntas approved these changes Dec 9, 2023

View reviewed changes

jiandewang mentioned this pull request Jan 7, 2024

GFDL 20231113 candidate PR failure on wcoss2 NOAA-EMC/MOM6#125

Closed

sanAkel mentioned this pull request Feb 13, 2024

Maintenance/Update MOM6 GEOS-ESM/GEOS_OceanGridComp#64

Merged

3 tasks

sanAkel approved these changes Feb 13, 2024

View reviewed changes

abozec approved these changes Feb 13, 2024

View reviewed changes

jiandewang approved these changes Feb 27, 2024

View reviewed changes

marshallward merged commit 2ab885e into mom-ocean:main Feb 28, 2024
12 checks passed

marshallward deleted the dev-gfdl-main-candidate-2023-11-13 branch February 28, 2024 15:59

This was referenced Mar 1, 2024

update to main 20240228 commit NOAA-EMC/MOM6#127

Merged

update MOM6 to its main repository 20240228 commit ufs-community/ufs-weather-model#2165

Closed

update MOM6 to its main repo. 20240228 updating ufs-community/ufs-weather-model#2166

Merged

mathomp4 mentioned this pull request Mar 6, 2024

GEOSgcm Coupled Model Failing at NAS GEOS-ESM/GEOSgcm#766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFDL to main, 2023-11-13 #1616

GFDL to main, 2023-11-13 #1616

marshallward commented Nov 13, 2023

alperaltuntas left a comment

marshallward commented Feb 7, 2024

awallcraft commented Feb 8, 2024

marshallward commented Feb 8, 2024 •

edited

Loading

jiandewang commented Feb 8, 2024

mathomp4 commented Feb 8, 2024

sanAkel commented Feb 8, 2024

marshallward commented Feb 8, 2024

sanAkel commented Feb 8, 2024

jiandewang commented Feb 13, 2024 •

edited

Loading

abozec left a comment

jiandewang commented Feb 14, 2024

marshallward commented Feb 14, 2024

jiandewang commented Feb 14, 2024

sanAkel commented Feb 14, 2024 •

edited

Loading

marshallward commented Feb 27, 2024

jiandewang left a comment

marshallward commented Feb 28, 2024

GFDL to main, 2023-11-13 #1616

GFDL to main, 2023-11-13 #1616

Conversation

marshallward commented Nov 13, 2023

Non-Boussinesq

Ice Shelf

Particle Tracers

ZB2020

Internal Tide

Brine Plume Mixing

Self-attraction and Loading

Additional Features

Bugfix

Refactor

Testing

Build

Misc

Contributors

alperaltuntas left a comment

Choose a reason for hiding this comment

marshallward commented Feb 7, 2024

awallcraft commented Feb 8, 2024

marshallward commented Feb 8, 2024 • edited Loading

jiandewang commented Feb 8, 2024

mathomp4 commented Feb 8, 2024

sanAkel commented Feb 8, 2024

marshallward commented Feb 8, 2024

sanAkel commented Feb 8, 2024

jiandewang commented Feb 13, 2024 • edited Loading

abozec left a comment

Choose a reason for hiding this comment

jiandewang commented Feb 14, 2024

marshallward commented Feb 14, 2024

jiandewang commented Feb 14, 2024

sanAkel commented Feb 14, 2024 • edited Loading

marshallward commented Feb 27, 2024

jiandewang left a comment

Choose a reason for hiding this comment

marshallward commented Feb 28, 2024

marshallward commented Feb 8, 2024 •

edited

Loading

jiandewang commented Feb 13, 2024 •

edited

Loading

sanAkel commented Feb 14, 2024 •

edited

Loading