Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to contrib installation of spack-stack on Jet #2878

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

InnocentSouopgui-NOAA
Copy link
Contributor

@InnocentSouopgui-NOAA InnocentSouopgui-NOAA commented Aug 29, 2024

Description

Migrates Global Workflow to use contrib installation of spack-stack on Jet.
Following the failure of the storage /lfs4 on Jet, the installation of spack spack moved to /contrib.
All softwares relying on spack-stack on Jet needs update.

Resolves #2841
Refs NOAA-EMC/gfs-utils#78
Refs NOAA-EMC/GSI#786
Refs NOAA-EMC/GSI-Monitor#143
Refs NOAA-EMC/GSI-utils#51
Refs ufs-community/UFS_UTILS#977

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

How has this been tested?

Example:

  • Clone and build on Jet

  • Cycled experiments (48+ hours) at resolutions

    • 96/48 on kjet
    • 192/96 on kjet
    • 384/192 on kjet
  • Forecast only experiment (48+ hours) at resolutions

    • 48
    • 96
    • 192
    • 384

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@InnocentSouopgui-NOAA
Copy link
Contributor Author

I am getting errors of the form bellow in forecast steps (tasks gdasfcst_seg0 and all enkfgdasfcst_mem###)
That is while running C192/C96 on Jet. It happened from the very first cycle, so it did not complete a single cycle.

@DavidHuber-NOAA

21: Warn_K=   6 (i,j)=   87   12 (lon,lat)=123.209 -43.765 VA = 264.64157
21:      K=   5    338.73022
21:      K=   7    217.40834
21: Warn_K=   6 (i,j)=   88   13 (lon,lat)=121.423 -43.705 VA = 250.69765
21:      K=   5    297.46832
21:      K=   7    213.49741
21: Warn_K=   6 (i,j)=   84   16 (lon,lat)=122.124 -46.976 VA = 251.07759
21:      K=   5    256.00562
21:      K=   7    241.60887
 0: PASS: fcstRUN phase 2, n_atmsteps =               27 time is         1.264142
 5:
 5: FATAL from PE     5: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
 5:
13:
13: FATAL from PE    13: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
13:
21:
21: FATAL from PE    21: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
21:
13: Image              PC                Routine            Line        Source
13: ufs_model.x        00000000086C51A7  Unknown               Unknown  Unknown
13: ufs_model.x        00000000078AD1B9  mpp_mod_mp_mpp_er          72  mpp_util_mpi.inc
13: ufs_model.x        0000000007B61AB6  mpp_efp_mod_mp_mp         195  mpp_efp.F90
13: ufs_model.x        0000000007AB2D99  mpp_domains_mod_m         143  mpp_global_sum.fh
13: ufs_model.x        0000000003E9EB6A  fv_grid_utils_mod        3077  fv_grid_utils.F90
13: ufs_model.x        0000000003F2ED3E  fv_mapz_mod_mp_la         794  fv_mapz.F90
13: libiomp5.so        0000146B48A6CBB3  __kmp_invoke_micr     Unknown  Unknown
13: libiomp5.so        0000146B489E8FAC  __kmp_fork_call       Unknown  Unknown
13: libiomp5.so        0000146B489AACB5  __kmpc_fork_call      Unknown  Unknown
13: ufs_model.x        0000000003F2A129  fv_mapz_mod_mp_la         683  fv_mapz.F90
13: ufs_model.x        0000000003E2EE61  fv_dynamics_mod_m         771  fv_dynamics.F90
13: ufs_model.x        0000000003C9236C  atmosphere_mod_mp         688  atmosphere.F90
13: ufs_model.x        0000000003A3490D  atmos_model_mod_m         879  atmos_model.F90
13: ufs_model.x        00000000035F688C  module_fcst_grid_        1335  module_fcst_grid_comp.F90

@InnocentSouopgui-NOAA
Copy link
Contributor Author

It seems to be just a bad day

@InnocentSouopgui-NOAA
Copy link
Contributor Author

I build initial conditions for other days and it cycled smoothly.

@InnocentSouopgui-NOAA InnocentSouopgui-NOAA marked this pull request as ready for review September 5, 2024 19:57
@RussTreadon-NOAA
Copy link
Contributor

GSI PR #787 has been merged into GSI develop. Done at 9f44c87.

The sorc/gsi_enkf.fd hash in InnocentSouopgui-NOAA:migration-jet-contrib must be updated to 9f44c87 to bring these changes into g-w.

@InnocentSouopgui-NOAA InnocentSouopgui-NOAA marked this pull request as draft September 6, 2024 16:13
@InnocentSouopgui-NOAA
Copy link
Contributor Author

A check is failing with the following message,

fatal: Fetched in submodule path 'ufs_utils.fd', but it did not contain 0426bf793051530794ec8f182e04f5cf129d0a90. Direct fetching of that commit failed.

How to diagnose what is going on?

@DavidHuber-NOAA
Copy link
Contributor

@InnocentSouopgui-NOAA I suspect that the hash you are pointing to is for your own branch. Update the hash instead to ufs-community/UFS_UTILS@06eec5b.

@InnocentSouopgui-NOAA
Copy link
Contributor Author

@InnocentSouopgui-NOAA I suspect that the hash you are pointing to is for your own branch. Update the hash instead to ufs-community/UFS_UTILS@06eec5b.

Thanks @DavidHuber-NOAA , that was the issue.

@DavidHuber-NOAA
Copy link
Contributor

@InnocentSouopgui-NOAA Just a heads up, there is a bug in the newest GSI-utils that will cause the gdasanalcalc job to fail when performing GDASApp analyses (i.e. the C96C48_ufs_hybatmDA CI test) as noted in #2819 (comment).

@InnocentSouopgui-NOAA
Copy link
Contributor Author

@DavidHuber-NOAA , should delay this PR for that bug?
I updated all the hash, I am doing a final test at res C96/48 before getting this out for review.

@InnocentSouopgui-NOAA InnocentSouopgui-NOAA marked this pull request as ready for review September 10, 2024 17:56
@DavidHuber-NOAA
Copy link
Contributor

I think a delay would be preferred, unfortunately, as this will break development for GDASApp. That said, I think we can go ahead and review it.

@InnocentSouopgui-NOAA
Copy link
Contributor Author

Everything is ready for review.

ush/forecast_predet.sh Outdated Show resolved Hide resolved
sorc/gdas.cd Outdated Show resolved Hide resolved
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Sep 17, 2024
@DavidHuber-NOAA
Copy link
Contributor

@InnocentSouopgui-NOAA The gsi-utils bug has been fixed. You should be able to update the hash now.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 17, 2024
Copy link
Member

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good, thanks @InnocentSouopgui-NOAA ! FYI I am updating the obsproc, prepobs and fit2obs versions in PR #2903 so those external pieces will be taken care of by the time this goes in.

sorc/gsi_utils.fd Outdated Show resolved Hide resolved
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Thanks for the work @InnocentSouopgui-NOAA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Jet to /lfs5
5 participants