Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CICE for advanced snow physics, history file timestamping and history file precision #757

Merged
merged 139 commits into from
Sep 20, 2021

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Aug 20, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

Implements advanced snow physics in Icepack and CICE. Fixes history file issue when the initial time is not at hour 0. This is needed for the update to the coupled RT tests, which use an initial time of hour=6 since it impacts the history file timestamp.

Changes are required in ice_in to accommodate both the advanced snow physics as well as a renaming of the evp algorithm.

Issue(s) addressed

See CICE issue #37 for a more complete description of changes.

Testing

All baselines for tests using CICE will change, even with the advanced snow physics option turned off.

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

Dependencies

  • waiting on CICE PR #38
  • waiting on IcePack PR #6

DeniseWorthen and others added 30 commits March 27, 2021 12:30
This reverts commit 7b826d4.
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on the CICE changes, but the changes in this PR are straightforward. Hopefully the Gaea issues will be addressed soon by the sysadmins so that we can reduce the wallclock time for compile jobs and run jobs again.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/716696085/20210915134513/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_5602
Baseline creation and move successful
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/716696085/20210915145627/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_19780
Test cpld_bmark_wave_v16_p7b 014 failed failed
Test cpld_bmark_wave_v16_p7b 014 failed in run_test failed
Test hafs_regional_atm_ocn 086 failed failed
Test hafs_regional_atm_ocn 086 failed in run_test failed
Please make changes and add the following label back:
hera-intel-BL

DeniseWorthen and others added 2 commits September 15, 2021 16:30
* two tests timed out and were run manually. The hafs_regional_atm_ocn
test passed with atmf006.nc using "ALT CHECK". The previous commit passed
atmf006.nc with direct comparison
@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: BL
Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/716696085/20210915133006/ufs-weather-model
Please manually delete: /lustre/f2/scratch/emc.nemspara/FV3_RT/rt_36997
Test regional_control 024 failed failed
Test regional_control 024 failed in run_test failed
Please make changes and add the following label back:
gaea-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/716696085/20210915193006/ufs-weather-model
Please manually delete: /lustre/f2/scratch/emc.nemspara/FV3_RT/rt_33977
Test regional_restart 036 failed in check_result failed
Test regional_restart 036 failed in run_test failed
Test control_stochy 028 failed failed
Test control_stochy 028 failed in run_test failed
Please make changes and add the following label back:
gaea-intel-RT

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Sep 16, 2021

I think the failed Gaea test revealed a problem with how the regional_control and regional_restart tests are set up and/or run. This is what I think happened:

The rt.conf has:

RUN     | regional_control                                                |                                         | fv3 |
RUN     | regional_restart                                                |                                         | fv3 | regional_control

So regional_restart is both run during baseline creation and is dependent on regional_control. The regional_restart test uses fv3_regional_restart as it's CNTL_DIR.

The Gaea Auto-BL reported a failure for regional_control so I re-ran that test. But the regional_restart test was not run by Auto-BL because regional_control itself failed. The BL created by Auto-BL is missing the regional_restart baseline. But only the regional_control was reported as a failed test.

When Auto-BL reported that regional_control failed, I created a baseline for that test manually and moved it to the BL directory.

When Auto-RT reported that regional_restart and control_stochy had both failed, I assumed both were from time-outs and set a manual test for regional_control,regional_restart and control_stochy . I ran both the control and restart for regional because I assumed restart was dependent on control.

But in fact the regional_restart failed because the baseline itself was missing.

I can go back and create the regional_restart manually, but I think we need to resolve how the control/restart test is being run for the regional.

* regional_control/restart and control_stochy/restart were run
manually
* all jobs ran and passed, but main log file was not created.
* created a substitue log from the individual rt_*log files
@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: orion
Compiler: intel
Job: BL
Repo location: /work/noaa/nems/emc.nemspara/autort/pr/716696085/20210915090008/ufs-weather-model
Please manually delete: /work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/rt_321097
Test control_c192 015 failed failed
Test control_c192 015 failed in run_test failed
Test control_wrtGauss_netcdf_parallel 013 failed failed
Test control_wrtGauss_netcdf_parallel 013 failed in run_test failed
Test control_ca 019 failed failed
Test control_ca 019 failed in run_test failed
Test control_stochy 018 failed failed
Test control_stochy 018 failed in run_test failed
Test control_rrtmgp 035 failed failed
Test control_rrtmgp 035 failed in run_test failed
Test control_csawmg 036 failed failed
Test control_csawmg 036 failed in run_test failed
Test control_csawmgt 037 failed failed
Test control_csawmgt 037 failed in run_test failed
Test control_flake 038 failed failed
Test control_flake 038 failed in run_test failed
Test control_ugwpv1 039 failed failed
Test control_ugwpv1 039 failed in run_test failed
Test control_ras 040 failed failed
Test control_ras 040 failed in run_test failed
Please make changes and add the following label back:
orion-intel-BL

@DeniseWorthen
Copy link
Collaborator Author

All the Orion failures were time-outs. I can retrieve all but three from my attempt at a manual BL creation. I'll need to get the flake,ugwpv1 and ras BLs created and then still run the verify step.

@DeniseWorthen DeniseWorthen merged commit 2b2e861 into ufs-community:develop Sep 20, 2021
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* Add @gspetro-NOAA, @natalie-perlin, and @EdwardSnyder-NOAA to CODEOWNERS so they are notified of all PRs and can review them.

* Remove duplicates in CODEOWNERS; remove users who will no longer be working with the repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants