Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete issue #607 (Optimization of multiscale regional runs) #670

Merged
merged 62 commits into from
Jan 26, 2024

Conversation

jderber-NOAA
Copy link
Contributor

@jderber-NOAA jderber-NOAA commented Dec 14, 2023

Description

This issue was intended to speed up regional multiscale runs. The initial parts of these changes were intended to speed up the initial processing of the observations. Unfortunately, I only able to complete changes to the reading of the observations.

The changes include the addition of the deallocates for many of the allocates in the reading routines, a restructuring of the thinning process and the addition of some new options for input pmot values.

The restructuring of the thinning process was to set flags for the data that was thinned and flags for data that was qc'ed in the read routine. Then based on the input pmot value the data was compressed to only retain the desired data. The pmot values are:

pmot = 0. Pass on all observations that fail the setup quality control, but not those that are thinned.
pmot = 1. Pass on all observations that fail the setup quality control and the obs that are thinned.
pmot = 2. Pass on only the observations that do not fail the quality control and are not thinned.
pmot = 3. Pass on only the observations that do not fail the quality control plus the obs that are thinned.

The pmot = 0 and 1 options are the same was in the read_satwnd routine before, but the change has been included for all routines that thin the input data using the convthin routines. The intent was to allow an option to minimize the data that is passed on to the setup routines if desired. This is similar to what is done for the global AVN runs as compared to the FNL runs, except is applicable to individual observation types.

The routines for which the above changes were made to are:

read_dbz_nc.f90 (pmot_dbz hardwired to 2 in obsmod)
read_fl_hdob.f90
read_goesimgr_skycover.f90
read_l2bufr_mod.f90
read_prepbufr.f90
read_radar.f90 (pmot_vr hardwired to 0 in obsmod)
read_radar_wind_ascii.f90 (pmot_vr hardwired to 0 in obsmod)
read_rapidscat.f90
read_satmar.f90
read_satwnd.f90
read_sfcwnd.f90
read_wcpbufr.f90

The values for pmot are defined as real numbers in the convinfo file (except where noted above).

In addition some other minor changes were made:

1. Changes to parallelization of control2state routines.
2. The combination of ensctl2state and ensctl2state_ad into a single ensctl2state module with the inclusion of a ensctl2state_set routine to define flags once. This is done in a way that is consistent with what is done with control2state above and allows the flags to be calculated once rather than repeatedly every time the routines are called.
3. Deallocating arrays in read_* routines that were left allocated. (removed considerable memory)
4. Making some of the muse flags in setup routines dependent on qc mark as well as usage flag.
5. Removing repeated print lines in windht after 10 prints. Number of times it would have been printed is then output (kcount). Removes lots of output for one regression test.
6. Removal of code that does nothing.
7. Some small changes to parallelization to improve efficiency.
8. Inclusion of some if-then-else's instead of repeated if-then's
9. Some reuse of arrays rather than defining new arrays.
10. Printing full double precision digits for the global norms.
11. Some minor changes to comments and indentation to make them more correct and consistent.

Please see issue #607 for more details.
Fixes #607

DUE DATE for merger of this PR into develop is 1/25/2024 (six weeks after PR creation).

Type of change

Please delete options that are not relevant.

  • [x ] Bug fix (non-breaking change which fixes an issue)
  • [x ] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

These changes have been tested within the context of the global runs to ensure that they reproduce the current results.

Also, the regression ctests were run. When running the ctests it was noted that identical results were not found (minor round off level differences). After extensive debugging it was found that some of the reading routines (read_radar, read_fl_hdob and read_radar_wind_ascii.f90 were not properly implemented to ensure that the observation order was retained. In addition errors were found in the convthin and convthin_time routines. The convthin and convthin_time routines probably did not impact the results. All of the necessary changes are in Hera in /scratch1/NCEPDEV/da/John.Derber/converge4/fixdev.

Note after the control and new code produced ctest results that passed of all tests, the new code was modified to slightly change the order of the observations. The changes were reorder the observations so that when convinfo entries for the data were available the data was ordered such that the first obs are those which do not have a convinfo entry and then the observations are ordered in the order of the convinfo entries. Within each of these groups, the order of the reading of the data is retained. This reordering results in some roundoff changes to the regression tests.

A final run of the ctests on Hera resulted in three of the regression tests not passing, but for those three regression tests only round-off differences were found.

ctest results:

1/7 Test #3: rrfs_3denvar_glbens .............. Passed 668.02 sec
2/7 Test #4: netcdf_fv3_regional .............. Passed 967.17 sec
3/7 Test #7: global_enkf ...................... Passed 996.21 sec
4/7 Test #6: hafs_3denvar_hybens ..............***Failed 1288.89 sec
5/7 Test #5: hafs_4denvar_glbens ..............***Failed 1465.38 sec
6/7 Test #2: rtma ............................. Passed 1630.76 sec
7/7 Test #1: global_4denvar ...................***Failed 1746.12 sec

57% tests passed, 3 tests failed out of 7

For the failed regression tests, the differences are very small.

For global_4denvar the difference over the first 5 iterations is:

New:
< Initial gradient norm = 1.700751272515123674E+03
< cost,grad,step,b,step? = 1 0 6.584168406980355503E+05 1.700751272515123674E+03 1.057429134927263981E+00 0.000000000000000000E+00 good
< cost,grad,step,b,step? = 1 1 6.547802752899467014E+05 2.113894264916129487E+03 1.927405067994359289E+00 1.302727127739899959E+00 good
< cost,grad,step,b,step? = 1 2 6.461427496709105326E+05 1.349310521414032110E+03 2.811042534667704373E+00 1.212753185604616535E+00 good
< cost,grad,step,b,step? = 1 3 6.308338990252234507E+05 1.492308135350379871E+03 3.826902315539039368E+00 4.828500088351426189E-01 good
< cost,grad,step,b,step? = 1 4 6.206079687548581278E+05 1.732872322582532661E+03 2.853763847749684945E+00 1.567453853347285486E+00 good
< cost,grad,step,b,step? = 1 5 6.086367270158508327E+05 2.052318345404916272E+03 2.200825030950578132E+00 1.123654808458361520E+00 good

Control:

Initial gradient norm = 1.700751272515123446E+03
cost,grad,step,b,step? = 1 0 6.584168406980355503E+05 1.700751272515123446E+03 1.057429134927265535E+00 0.000000000000000000E+00 good
cost,grad,step,b,step? = 1 1 6.547802752899467014E+05 2.113894264916132670E+03 1.927405067994362176E+00 1.302727127739904844E+00 good
cost,grad,step,b,step? = 1 2 6.461427496709105326E+05 1.349310521414032564E+03 2.811042534667710591E+00 1.212753185604620310E+00 good
cost,grad,step,b,step? = 1 3 6.308338990252234507E+05 1.492308135350387374E+03 3.826902315539030930E+00 4.828500088351431740E-01 good
cost,grad,step,b,step? = 1 4 6.206079687548580114E+05 1.732872322582532206E+03 2.853763847749785310E+00 1.567453853347265946E+00 good
cost,grad,step,b,step? = 1 5 6.086367270158507163E+05 2.052318345404855336E+03 2.200825030951230499E+00 1.123654808458337984E+00 good

The penalties are identical for the first 3 iterations and the rest of the differences are only in the last few digits.
hafs_3denvar_hybens:

Control:
< cost,grad,step,b,step? = 1 0 1.484488764136420796E+05 5.220586731636382865E+03 3.266670310728393423E-01 0.000000000000000000E+00 good
< cost,grad,step,b,step? = 1 1 1.395457213798956072E+05 3.987278541138146920E+03 1.020272669622300121E+00 5.833302794603654196E-01 good
< cost,grad,step,b,step? = 1 2 1.214543780598063022E+05 2.282966557825236123E+03 1.175497927720148050E+00 3.579527393674070690E-01 good
< cost,grad,step,b,step? = 1 3 1.148424922169799393E+05 2.138015369099894997E+03 9.466815484382745671E-01 8.870869467349256077E-01 good
< cost,grad,step,b,step? = 1 4 1.101613679136846185E+05 3.525710978798607357E+03 7.850637926717303205E-01 2.702293955513371593E+00 good
< cost,grad,step,b,step? = 1 5 1.004694635052513186E+05 1.314894254315365288E+03 4.282579905121543185E+00 1.420689505641469230E-01 good

New:

cost,grad,step,b,step? = 1 0 1.484488764136420796E+05 5.220586731636382865E+03 3.266670310728393978E-01 0.000000000000000000E+00 good
cost,grad,step,b,step? = 1 1 1.395457213798956072E+05 3.987278541138147830E+03 1.020272669622299677E+00 5.833302794603657526E-01 good
cost,grad,step,b,step? = 1 2 1.214543780598063022E+05 2.282966557825236578E+03 1.175497927720150937E+00 3.579527393674070135E-01 good
cost,grad,step,b,step? = 1 3 1.148424922169799247E+05 2.138015369099901818E+03 9.466815484382588020E-01 8.870869467349332682E-01 good
cost,grad,step,b,step? = 1 4 1.101613679136846331E+05 3.525710978798565066E+03 7.850637926717397574E-01 2.702293955513275225E+00 good
cost,grad,step,b,step? = 1 5 1.004694635052513186E+05 1.314894254315372791E+03 4.282579905120070585E+00 1.420689505641908323E-01 good

The penalties are the same in iterations 1,2 and 5. Other differences are small

hafs_4denvar_glbens:

Control:

cost,grad,step,b,step? = 1 0 1.098322171332156431E+05 3.099058770311941316E+03 8.407846553134071810E-01 0.000000000000000000E+00 good
< cost,grad,step,b,step? = 1 1 1.017571823539604084E+05 1.946982757490602580E+03 2.124115380886761439E+00 3.946976915343667347E-01 good
< cost,grad,step,b,step? = 1 2 9.196514028700758354E+04 1.924285054946333275E+03 2.935424561242293784E+00 9.958096626362779036E-01 good
< cost,grad,step,b,step? = 1 3 8.060823346426019270E+04 1.393999917628704225E+03 2.990452695380884762E+00 6.239418852657658832E-01 good
< cost,grad,step,b,step? = 1 4 7.450440407439092814E+04 2.221145790520224182E+03 1.438639498276132356E+00 2.537376742307893895E+00 good
< cost,grad,step,b,step? = 1 5 6.737636465893953573E+04 1.031614475569622755E+03 6.726333681219393235E+00 2.239613119996544666E-01 good


New:< cost,grad,step,b,step? = 1 0 1.098322171332156431E+05 3.099058770311941316E+03 8.407846553134072920E-01 0.000000000000000000E+00 good

cost,grad,step,b,step? = 1 1 1.017571823539604084E+05 1.946982757490602353E+03 2.124115380886762772E+00 3.946976915343667902E-01 good
cost,grad,step,b,step? = 1 2 9.196514028700758354E+04 1.924285054946334185E+03 2.935424561242303110E+00 9.958096626362797910E-01 good
cost,grad,step,b,step? = 1 3 8.060823346426019270E+04 1.393999917628710364E+03 2.990452695380865666E+00 6.239418852657728776E-01 good
cost,grad,step,b,step? = 1 4 7.450440407439094270E+04 2.221145790520220999E+03 1.438639498276145234E+00 2.537376742307857480E+00 good
cost,grad,step,b,step? = 1 5 6.737636465893956483E+04 1.031614475569631850E+03 6.726333681219014871E+00 2.239613119996727575E-01 good

All values (grad,step,b) are the same in iteration 0. The penalties are the same in 0,1,2, and 3. Other differences are small.

Since the code was tested with the same order of observations and produced no differences and the only differences are in the final digits, I conclude that the differences are only round-off due to different ordering of observations from the read_ routines.

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Jan 20, 2024
@TingLei-NOAA
Copy link
Contributor

An update : the failure with hafs regression test when gsi is compiled in debug mode was from the undefined toff in read_radar_l2rw_novadqc. The issue existing in the current EMC GSI and already have a PR for fix https://github.com/NOAA-EMC/GSI/pull/679/files#diff-9237af61f8a4787143a6ef94bfdefcd7a764f8f986e54370beffb92b3f08d2d3.

@TingLei-NOAA
Copy link
Contributor

An update : when EnKF is also compiled with debug mode, the current EMC GSI (and, hence, including this PR) would fail the regression test global_enkf . There are two places in letkf.f90. The first is easy to find why and fix.

call mpi_reduce(tend-tbegin,tmean,1,mpi_real8,mpi_sum,0,mpi_comm_world,ierr)
tmean = tmean/numproc

(and the similar lines for tmean), tmean is only valid on rank 0 after mpi_reduce, but is used in the following line on every mpi rank.

The second issue is more tricky. The error message is :

41: forrtl: error (75): floating point exception
 41: Image              PC                Routine            Line        Source
 41: enkf.x             0000000004B5152B  Unknown               Unknown  Unknown
 41: libpthread-2.17.s  00002B95C4972630  Unknown               Unknown  Unknown
 41: enkf.x             000000000053D86D  letkf_mp_letkf_up         281  letkf.f90
 41: libiomp5.so        00002B95C4CBEBB3  __kmp_invoke_micr     Unknown  Unknown
 41: libiomp5.so        00002B95C4C3AFAC  __kmp_fork_call       Unknown  Unknown
 41: libiomp5.so        00002B95C4BFCCB5  __kmpc_fork_call      Unknown  Unknown
 41: enkf.x             000000000053701E  letkf_mp_letkf_up         281  letkf.f90

The line 281 is the beginning line of a large block of openmp directives :

!$omp parallel do schedule(dynamic) default(none) private(npt,nob,nobsl, &
!$omp                  nobsl2,ngrd1,corrlength,ens_tmp,coslat, &
!$omp                  nf,vdist,obens,indxassim,indxob,maxdfs, &
!$omp                  nn,hxens,wts_ensmean,dfs,rdiag,dep,rloc,i, &
....

I haven't identified the culprit for this error.
Since this issue is from EMC GSI, it is not specific to this PR and just an FYI here since it is found when I ran regression tests with debug mode GSI.

Copy link
Collaborator

@shoyokota shoyokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all modifications! I approved now.

@shoyokota
Copy link
Collaborator

I did the Orion tests as follows.

Test project /home/syokota/da/gsi/jderber-NOAA/optimize6/build
    Start 1: global_4denvar
1/7 Test #1: global_4denvar ...................***Failed  1807.53 sec
    Start 2: rtma
2/7 Test #2: rtma .............................   Passed  1150.92 sec
    Start 3: rrfs_3denvar_glbens
3/7 Test #3: rrfs_3denvar_glbens ..............   Passed  668.53 sec
    Start 4: netcdf_fv3_regional
4/7 Test #4: netcdf_fv3_regional ..............   Passed  304.54 sec
    Start 5: hafs_4denvar_glbens
5/7 Test #5: hafs_4denvar_glbens ..............***Failed  1938.86 sec
    Start 6: hafs_3denvar_hybens
6/7 Test #6: hafs_3denvar_hybens ..............***Failed  1342.42 sec
    Start 7: global_enkf
7/7 Test #7: global_enkf ......................   Passed  752.57 sec

57% tests passed, 3 tests failed out of 7

Total Test time (real) = 7965.40 sec

The following tests FAILED:
          1 - global_4denvar (Failed)
          5 - hafs_4denvar_glbens (Failed)
          6 - hafs_3denvar_hybens (Failed)
Errors while running CTest

Non-reproducibility in global_4denvar, hafs_4denvar_glbens, and hafs_3denvar_hybens is as we expected. There are no fatal issues.

@CatherineThomas-NOAA
Copy link
Collaborator

I wonder if there is any improvement with an AVN run where the diagnostic files are already reduced.

I ran some more full resolution tests on WCOSS2 at b7bdbc4, this time for the early dump. Here are the timings:

Control | pmot=0  | pmot=2

1474.49 | 1432.36 | 1439.71
1471.59 | 1459.48 | 1430.03
1473.16 | 1440.87 | 1451.98
1470.20 | 1435.78 | 1436.15
1473.01 | 1436.50 | 1435.40
   
1472.49 | 1441.00 | 1438.65

All PR tests are faster than all control tests.

My findings for the cost function are similar to my previous full resolution tests. Cost functions are identical within each test (i.e. all pmot=0 are the same). The pmot=0 and pmot=2 cases have identical cost functions. The difference between the control and pmot initial cost functions are identical to 7 digits, which is more different than the regression tests:

Control: 1   0  1.592669394534466555E+06  4.103124635798992131E+03  2.612806160403545075E+00 
PR:      1   0  1.592669436005641473E+06  4.103124485022790395E+03  2.612806511958422728E+00 

@jderber-NOAA
Copy link
Contributor Author

Sho, thanks for the approval.

@jderber-NOAA
Copy link
Contributor Author

Cathy,

Thanks for running this experiment! Glad you got positive results! But a bit confused why you got different initial penalties. What observation types produced a different initial penalty (e.g., radiances, wind, etc.)? Also, can you send me the diff of the output diagnostic files (fort.2**) for the observation type for the first outer iteration?

Thanks!

@RussTreadon-NOAA
Copy link
Contributor

The due date for completion of this PR is Thursday, 1/25/2024. @shoyokota has approved the PR. @TingLei-NOAA , what outstanding questions or issues do you need answered or resolved before you can complete your review?

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA As reported previously, a few GSI/EnKF failures in the regression tests when GSI/EnKF are built with debug mode had been investigated . It was finally concluded they are from EMC GSI .( Some already have fix in other PR and, one in EnKF has a simple fix and another one with the letkf need more efforts ) and shouldn't affect this PR. What I am going to do is to check the rrfs comparison with this PR and then I will do a quick reviewing of the current codes. That is my plan and still can't guarantee I could make it before your due date.

@RussTreadon-NOAA
Copy link
Contributor

Thank you, @TingLei-NOAA , for the update. By what date do you anticipate completing your review? Knowing this helps the GSI Handling Review team plan their work as well as help @jderber-NOAA arrange his schedule.

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Thanks for all your efforts. Just hope you also noticed there is a possible reproducibility in the global case @CatherineThomas-NOAA provided . On my side, I expect I could finish all my actions in one week if everything is going well.
I really appreciate John's master 's effort for this large scale GSI optimization and response in this PR ' (Thanks also to Sho's rather complete reviews) and hope I could help finish this PR as good as I can .

@RussTreadon-NOAA
Copy link
Contributor

@TingLei-NOAA , as mentioned last Friday (1/19/2024) would you prefer to be removed as a reviewer from this PR?

This allows you to continue your tests without needing to work outside of your normal workday schedule. You have a lot on your plate and we don't want you burning the candle at both ends to complete this PR by due date.

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Sure. Please go ahead. I will give updates on my verification results for this PR.

@CatherineThomas-NOAA
Copy link
Collaborator

@jderber-NOAA

The initial difference is in the radiances:
Control:

     J term                                     J
excess   moisture            3.8458219840846913E-01
surface pressure             1.1982398576591007E+04
temperature                  8.8339292197175426E+04
wind                         1.9503807419532203E+05
moisture                     8.2165047114701410E+03
sst                          3.3361437026688859E+03
ozone                        1.8475954283744115E+04
gps bending angle            2.5125135570621365E+05
radiance                     1.0160292865790830E+06
-----------------------------------------------------
 J Global                    1.5926693945344666E+06

PR:

     J term                                     J
excess   moisture            3.8458219840846913E-01
surface pressure             1.1982398576591007E+04
temperature                  8.8339292197175426E+04
wind                         1.9503807419532203E+05
moisture                     8.2165047114701410E+03
sst                          3.3361437026688859E+03
ozone                        1.8475954283744115E+04
gps bending angle            2.5125135570621365E+05
radiance                     1.0160293280502578E+06
 -----------------------------------------------------
 J Global                    1.5926694360056415E+06

My original tests have been scrubbed, but I'll rerun a single case of each today and copy over the fort files.

@jderber-NOAA
Copy link
Contributor Author

jderber-NOAA commented Jan 23, 2024 via email

@jderber-NOAA
Copy link
Contributor Author

Cathy,

Will run my full resolution tests again. Could you send your script (or one of your outputs) to me (or put it on Hera) so that I can make sure I am using the same satellite data types?

John

@TingLei-NOAA
Copy link
Contributor

An update : the current PR produced the identical results, in terms of stats in the stdout, compared with the EMC GSI Develop branch in a RRFS parallel case using obs including sat and dbz.

@CatherineThomas-NOAA
Copy link
Collaborator

@jderber-NOAA @TingLei-NOAA
I got to the root of the initial cost function differences. @RussTreadon-NOAA ran several cases on WCOSS2 as well and found identical cost functions. I also found another cycle that had identical cost functions with my same setup. We explored some of the differences between our configurations but the problem was with my executables. My clone of develop contained the Dec 20 update to setuprad.f90 while my clone of John's branch did not. Russ checked out a newer version of John's branch for his tests, which contained John's merge from develop.

My new tests using the latest version of the PR branch now produces identical initial cost functions.

@jderber-NOAA
Copy link
Contributor Author

jderber-NOAA commented Jan 24, 2024 via email

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Merge pending final peer review.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @shoyokota and @CatherineThomas-NOAA for your approvals. I'll coordinate with the Handling Review team to merge this PR into develop.

@jderber-NOAA
Copy link
Contributor Author

jderber-NOAA commented Jan 26, 2024 via email

@RussTreadon-NOAA RussTreadon-NOAA merged commit 8ed034f into NOAA-EMC:develop Jan 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimization of multiscale regional runs
6 participants