Reimplementation of bridge/multifile feature for data_override in main #1408

nikizadehgfdl · 2023-11-14T01:26:31Z

Should replace PR implementation of forcing multifile feature #1214
Raphael Dussin implementation of bridge/multifile feature for data_override
This commit is basically Raph's branch raphaeldussin:feature_data_override_multifile merged with the main branch of FMS in PR implementation of forcing multifile feature #1214
Due to restructure of time_interp and data_override dirs in FMS2023.03 I could not (did not know how to) do this merge via git and had to extract the mods from that PR and apply them to the main branch.
Does not change answers.

Description
This updates enables using multiple data source files for data_override for one field. This is necessary for example when the data is contained in annual files (JRA or CORE datasets) and model time is at the start or end of one model year. Data_override needs the data at the end or start of the adjacent year to interpolate. Currently this is a limitation of data_override and the source data have to be specially prepared for GFDL models by a process called padding (prepend the last few time levels of the previous year and append the first few levels of the next year).
This updates enables data_override to work without such padding by specifying multiple files in the data_table like:

"ATM", "p_surf",  "psl", "./INPUT/JRA_psl_ym1.nc:./INPUT/JRA_psl_yy1.nc:./INPUT/JRA_psl_yp1.nc",     "bilinear",   1.0

where JRA_psl_ym1.nc is data for previous model year
JRA_psl_yy1.nc is data for current model year
JRA_psl_yp1.nc is data for next model year

How Has This Been Tested?
This has been tested in ocean-ice models with JRA dataset and reproduces same answers.

Checklist:

[* ] My code follows the style guidelines of this project
[ *] I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
[ *] My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
New check tests, if applicable, are included
make distcheck passes

- Raphael Dussin implementation of bridge/multifile feature for data_override - This commit is basically Raphs branch raphaeldussin:feature_data_override_multifile merged with the main branch of FMS in PR NOAA-GFDL#1214 - Due to restructure of time_interp and data_override dirs in FMS2023.03 I could not (did not know how to) do this merge via git and had to extract the mods from that PR and apply them to the main branch. - Does not change answers.

nikizadehgfdl · 2023-11-14T01:34:08Z

@raphaeldussin this is what I have for the "merge" of your branch to main. I made this PR just so you and others could review what is involved, how you use it is up to you.
Because of the time_interp and data_override refactoring in FMS2023.03 I don't know how to compare this branch with yours via git. However, it yields the same answers for your OM4p25_ERA5_spinup_v4 experiments as your branch and does not change answers or timings for OMIP_JRA.

nikizadehgfdl · 2023-11-14T16:00:55Z

This PR did not have a merge conflict yesterday! Is the conflict because of the keywords change for "data" ?

raphaeldussin · 2023-11-14T17:00:48Z

thanks a lot @nikizadehgfdl ! I'll have a look tonight. The "data" keyword was one of the requested changes in my PR since it is also a Fortran construct.

uramirez8707 · 2023-11-16T13:17:47Z

@raphaeldussin Thanks for implementing this and thanks @nikizadehgfdl for solving the merge conflicts (sorry you have to do it again)

The code updates look good to me.
@raphaeldussin Do you get any difference in performance between using this new bridging method and the regular method? Particularly at initialization?

raphaeldussin · 2023-11-16T15:49:39Z

Regarding the performance at initialization, I found no extra cost. I've shared with @nikizadehgfdl my xml and job output so he could confirm my findings.

bensonr

I may have missed it, but Is it required that all files in the multi-file list have the same spatial dimensions? If it is required, is there a check to ensure?

bensonr · 2023-11-16T16:36:32Z

data_override/include/data_override.inc

+            data_entry%multifile = .true.
+            index_1col = index(data_entry%file_name, ":")
+            index_2col = index(data_entry%file_name(index_1col+1:), ":")
+            if (index_2col .eq. 0) call mpp_error(FATAL, "data_override: must provide either 1 or 3 forcing files")


Why are we limited to 1 or 3 files?

2 files poses the question of "should it go before or after?". Also most of the time we need to provide a duplicate of the very first record of the timeserie for time interpolation (unless first record falls exactly at midnight of Jan 1st) at the beginning of the run (and similar for the end) so we always end up padding at the beginning and end of all years

bensonr · 2023-11-16T16:38:37Z

data_override/include/data_override.inc

+               data_entry%multifile = .true.
+               index_1col = index(data_entry%file_name, ":")
+               index_2col = index(data_entry%file_name(index_1col+1:), ":")
+               data_entry%prev_file_name = data_entry%file_name(1:index_1col-1)
+               data_entry%next_file_name = data_entry%file_name(index_1col+index_2col+1:)
+               ! once previous/next files are filled in, overwrite current
+               data_entry%file_name = data_entry%file_name(index_1col+1:index_1col+index_2col-1)


The other code blocks for parsing the multiple files has a test for 1 or 3 forcing files. Does this code block need the same?

we could test for 2 occurences of the : separator between files to assure there are either 1 or 3 files but not 2. This would bulletproof the code I agree, good catch!

bensonr · 2023-11-16T16:39:08Z

data_override/include/data_override.inc

+            index_1col = index(data_entry%file_name, ":")
+            index_2col = index(data_entry%file_name(index_1col+1:), ":")
+            data_entry%multifile = .true.
+            if (index_2col .eq. 0) call mpp_error(FATAL, "data_override: must provide either 1 or 3 forcing files")


Same as above comment regarding the number of files allowed.

nikizadehgfdl · 2023-11-17T16:45:10Z

Regarding the performance at initialization, I found no extra cost. I've shared with @nikizadehgfdl my xml and job output so he could confirm my findings.

I looked at the clocks for the same 1 month ocean-ice experiment forced with JRA data with and without the multi-file-bridge data_override (apples to apples). The Ocean and Ice performance were the same. The nice surprise was that the data-override/flux-exchange actually benefited from the multi-file scheme and the whole model sped up by ~6%.

clock              multi-file(secs)     single-file(secs)       
Main loop                      857   911
  A-L: sfc_boundary_layer       24    62
  A-L: flux_down_from_atmos     20    42

I don't fully understand this since the test was done for the month of October which only one forcing file is involved.

- githubCI complains: 1143 | #include "time_interp_external2_bridge_r8.fh" Fatal Error: time_interp_external2_bridge_r4.fh: No such file or directory

nikizadehgfdl · 2023-11-21T18:25:52Z

I am done with the merging and fixing the CI tests. The one that fails (Test coupler build) has nothing to do with this PR.
@raphaeldussin could you take it from here addressing the reviewers comments? We'd better push this through while there is no merge conflict. Thanks.

raphaeldussin · 2023-11-21T18:32:26Z

@nikizadehgfdl many thanks for this! What is the preferred method here? I can work off your branch and PR to it if that's ok with you? Then this PR can be merged in main and we abandon my original PR. sounds good to you?

…ile_updated revised PR, adresses comments

nikizadehgfdl · 2023-11-27T20:56:14Z

@raphaeldussin made the mods per reviews of PR #1214 which is going to be closed and replaced by this one. @bensonr @thomas-robinson @rem1776 , please see if your concerns have been addressed. Thanks.

…tation

uramirez8707 · 2023-11-28T16:42:36Z

@raphaeldussin I implemented this in the yaml version of the data table and updated the documentation in nikizadehgfdl#3

Let me know if you are okay with it or would like any changes.

I think this is the last thing left to do.

multi file data override yaml

thomas-robinson

Any new variables need to be documented with doxygen-style comments
There are a lot of new big if blocks in data_override.inc. I stopped adding the comment that they should be documented and labeled because there are so many. Please label the large if blocks to make reading the code easier in the future and add comments to explain what the if block is doing. I think it would also be helpful to add to the function/subroutine descriptions
Is it possible to put the if blocks into a function and then call the function? Are these if blocks copy-and-pastes? They seem similar, and a function or subroutine would help reduce the size of this code.
I may have missed it, but I don't think that the term "bridge" is ever defined or discussed anywhere in the code documentation. "Bridge" is used extensively through the documentation, so I think there needs to be a clear, longer explaination of what is happening and what "bridge" means.

thomas-robinson · 2023-12-04T12:47:24Z

data_override/include/data_override.inc

+   character(len=512) :: prev_file_name   !< name of netCDF data file for previous segment
+   character(len=512) :: next_file_name   !< name of netCDF data file for next segment


Can these be allocatable? If they aren't always used, then this is a waste of memory.

thomas-robinson · 2023-12-04T12:57:06Z

data_override/include/data_override.inc

+  type(time_type)    :: first_record, last_record
  character(len=512) :: filename !< file containing source data
+  character(len=512) :: prevfilename !< file containing source data for bridge
+  character(len=512) :: nextfilename !< file containing source data for bridge
  character(len=128) :: fieldname !< fieldname used in the data file
  integer :: index1 !< field index in data_table
+  integer            :: dims(4), prev_dims(4), next_dims(4)
  integer :: id_time !< index for time interp in override array
+  integer            :: id_time_prev=-1 !index for time interp in override array
+  integer            :: id_time_next=-1 !index for time interp in override array
  integer :: curr_position !< position of the field currently processed in override_array
  integer :: i
  real(FMS_DATA_OVERRIDE_KIND_) :: factor
+  logical :: multifile


All of these new variables need doxygen documentation

thomas-robinson · 2023-12-04T12:57:48Z

data_override/include/data_override.inc

@@ -695,8 +781,66 @@ subroutine DATA_OVERRIDE_0D_(gridname,fieldname_code,data_out,time,override,data
     id_time = override_array(curr_position)%t_index
  endif !if curr_position < 0

+
+  if (multifile) then


This if-block should have documentation

thomas-robinson · 2023-12-04T12:58:36Z

data_override/include/data_override.inc

@@ -695,8 +781,66 @@ subroutine DATA_OVERRIDE_0D_(gridname,fieldname_code,data_out,time,override,data
     id_time = override_array(curr_position)%t_index
  endif !if curr_position < 0

+
+  if (multifile) then


Labeling this if block would make the code easier to read

thomas-robinson · 2023-12-04T12:58:53Z

data_override/include/data_override.inc

+  first_record = data_table(index1)%time_records(1)
+  last_record = data_table(index1)%time_records(dims(4))
+
+  if (multifile) then   ! bridging between files


labeling this if block would make the code easier to read

thomas-robinson · 2023-12-04T14:08:35Z

data_override/include/data_override.inc

-              is_in=is_in,ie_in=ie_in,js_in=js_in,je_in=je_in,window_id=window_id)
-              return_data = return_data*factor
+
+           if (multifile) then  ! bridging between files, see previous blocks for comments


This if statement should be labeled and documented

thomas-robinson · 2023-12-04T14:08:43Z

data_override/include/data_override.inc

-              mask_out   =mask_out, &
-              is_in=is_in,ie_in=ie_in,js_in=js_in,je_in=je_in,window_id=window_id)
+
+           if (multifile) then   ! bridging between files, see previous blocks for comments


This if statement should be labeled and documented

thomas-robinson · 2023-12-04T14:12:54Z

time_interp/include/time_interp_external2_bridge.inc

+      integer, intent(in) :: index1, index2
+      type(time_type), intent(in) :: time
+      real(FMS_TI_KIND_), dimension(:,:), intent(inout) :: data_in
+      integer, intent(in), optional :: interp
+      logical, intent(in), optional :: verbose
+      type(horiz_interp_type),intent(in), optional :: horz_interp
+      logical, dimension(:,:), intent(out), optional :: mask_out !< set to true where output data is valid
+      integer,                  intent(in), optional :: is_in, ie_in, js_in, je_in
+      integer,                  intent(in), optional :: window_id
+
+      real(FMS_TI_KIND_), dimension(size(data_in,1), size(data_in,2), 1) :: data_out
+      logical, dimension(size(data_in,1), size(data_in,2), 1) :: mask3d


This is a new subroutine, so these variables should all be documented in doxygen-style

thomas-robinson · 2023-12-04T14:14:12Z

time_interp/include/time_interp_external2_bridge.inc

+      integer,                    intent(in) :: index1,index2 !< index of external field from previous call
+                                                                            !! to init_external_field
+      type(time_type),            intent(in) :: time !< target time for data
+      real(FMS_TI_KIND_), dimension(:,:,:),  intent(inout) :: time_data !< global or local data array
+      integer,                    intent(in), optional :: interp
+      logical,                    intent(in), optional :: verbose !< flag for debugging
+      type(horiz_interp_type),    intent(in), optional :: horz_interp
+      logical, dimension(:,:,:), intent(out), optional :: mask_out !< set to true where output data is valid
+      integer,                    intent(in), optional :: is_in, ie_in, js_in, je_in
+      integer,                    intent(in), optional :: window_id
+
+      type(time_type) :: time1, time2
+      integer :: dims1(4), dims2(4)
+      integer :: first_rec, last_rec
+      integer :: nx, ny, nz, interp_method, t1, t2
+      integer :: i1, i2, isc, iec, jsc, jec, mod_time
+      integer :: isc1, iec1, jsc1, jec1, isc2, iec2, jsc2, jec2
+      integer :: yy, mm, dd, hh, min, ss
+      character(len=256) :: err_msg, filename
+
+      integer :: isw, iew, jsw, jew, nxw, nyw
+          ! these are boundaries of the updated portion of the "data" argument
+          ! they are calculated using sizes of the "data" and isc,iec,jsc,jsc
+          ! fileds from respective input field, to center the updated portion
+          ! in the output array
+
+      real(FMS_TI_KIND_) :: w1,w2
+      logical :: verb
+      character(len=16) :: message1, message2
+      integer, parameter :: kindl = FMS_TI_KIND_


These variables need doxygen-style documentation

thomas-robinson · 2023-12-04T14:15:13Z

time_interp/include/time_interp_external2_bridge.inc

+      integer, intent(in) :: index1, index2
+      type(time_type), intent(in) :: time
+      real(FMS_TI_KIND_), intent(inout) :: time_data
+      logical, intent(in), optional :: verbose
+
+      integer :: t1, t2
+      integer :: i1, i2, mod_time
+      integer :: yy, mm, dd, hh, min, ss
+      character(len=256) :: err_msg, filename
+      type(time_type) :: time1, time2
+      integer :: dims1(4), dims2(4)
+
+      real(FMS_TI_KIND_) :: w1,w2
+      logical :: verb
+      integer, parameter :: kindl = FMS_TI_KIND_


These variables need doxygen-style documentation

thomas-robinson · 2023-12-04T14:26:57Z

Is it possible to add a unit test to test the functionality?

mlee03 · 2023-12-04T15:40:32Z

time_interp/include/time_interp_external2_bridge.inc

+      time1 = loaded_fields(index1)%time(t1)
+      time2 = loaded_fields(index2)%time(t2)
+      w2= (time - time1) // (time2 - time1)
+      w1 = 1.0-w2


mlee03 · 2023-12-04T15:42:05Z

time_interp/include/time_interp_external2_bridge.inc

+      time1 = loaded_fields(index1)%time(t1)
+      time2 = loaded_fields(index2)%time(t2)
+      w2= (time - time1) // (time2 - time1)
+      w1 = 1.0-w2


time_interp/include/time_interp_external2_bridge.inc

…ile_updated correction from PR review

raphaeldussin · 2023-12-12T15:37:34Z

@thomas-robinson @mlee03 I think the changes address most of your concerns. Additionally:

I agree that the number of if blocks is not ideal. One could move this into a new function but since each block calls time_interp with different set of optional arguments there is little we could do to reduce the code bloat.
I have added more comments and a better explanation about the "bridging" process in data_override. I'm happy to add some more documentation now or later if you can point me to the appropriate place to do so.

thomas-robinson

Thank you for your updates. I think this code is much easier to read now.

nikizadehgfdl · 2023-12-19T21:27:07Z

data_override/include/data_override.inc

+    endif if_time2
+  else ! standard behavior
+     if ((time<first_record) .or. (time>last_record)) then
+       call mpp_error(WARNING, &


This WARNING is swamping the stdout for regression tests that were/are working fine. There is something wrong with this logic.

nikizadehgfdl requested review from thomas-robinson, bensonr and rem1776 as code owners November 14, 2023 01:26

bensonr reviewed Nov 16, 2023

View reviewed changes

nikizadehgfdl added 3 commits November 17, 2023 14:00

Merge branch 'main' into raphDussin_multifile_dataoverride

d91fb3a

Adding new files to fix githubCI

236f06c

- githubCI complains: 1143 | #include "time_interp_external2_bridge_r8.fh" Fatal Error: time_interp_external2_bridge_r4.fh: No such file or directory

Fix githubCI build

9c7107f

nikizadehgfdl changed the title ~~[Draft] Reimplementation of bridge/multifile feature for data_override in main~~ Reimplementation of bridge/multifile feature for data_override in main Nov 20, 2023

Raphael Dussin and others added 2 commits November 27, 2023 13:38

revised PR, adresses comments

3a55485

Merge pull request #2 from raphaeldussin/feature_data_override_multif…

0063c64

…ile_updated revised PR, adresses comments

implement the multi file data override in yaml and update the documen…

f916b31

…tation

uramirez8707 and others added 2 commits November 28, 2023 12:28

allow for two and/or three files

4c19eb3

Merge pull request #3 from uramirez8707/multi_file_yaml

e35629e

multi file data override yaml

nikizadehgfdl mentioned this pull request Nov 30, 2023

implementation of forcing multifile feature #1214

Closed

8 tasks

thomas-robinson requested changes Dec 4, 2023

View reviewed changes

thomas-robinson requested a review from mlee03 December 4, 2023 14:25

mlee03 suggested changes Dec 4, 2023

View reviewed changes

correction from PR review

1cecb3a

Merge pull request #4 from raphaeldussin/feature_data_override_multif…

9cbc43a

…ile_updated correction from PR review

thomas-robinson approved these changes Dec 12, 2023

View reviewed changes

rem1776 approved these changes Dec 14, 2023

View reviewed changes

rem1776 merged commit e3fb28e into NOAA-GFDL:main Dec 14, 2023
19 checks passed

nikizadehgfdl commented Dec 19, 2023

View reviewed changes

nikizadehgfdl mentioned this pull request Dec 19, 2023

2024.01-alpha2 stdout is swamped by a warning increasing its size 1000 fold #1430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplementation of bridge/multifile feature for data_override in main #1408

Reimplementation of bridge/multifile feature for data_override in main #1408

nikizadehgfdl commented Nov 14, 2023 •

edited

Loading

nikizadehgfdl commented Nov 14, 2023

nikizadehgfdl commented Nov 14, 2023

raphaeldussin commented Nov 14, 2023 •

edited

Loading

uramirez8707 commented Nov 16, 2023

raphaeldussin commented Nov 16, 2023

bensonr left a comment

bensonr Nov 16, 2023

raphaeldussin Nov 16, 2023

bensonr Nov 16, 2023

raphaeldussin Nov 16, 2023

bensonr Nov 16, 2023

nikizadehgfdl commented Nov 17, 2023

nikizadehgfdl commented Nov 21, 2023

raphaeldussin commented Nov 21, 2023

nikizadehgfdl commented Nov 27, 2023

uramirez8707 commented Nov 28, 2023

thomas-robinson left a comment

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson Dec 4, 2023

thomas-robinson commented Dec 4, 2023

mlee03 Dec 4, 2023

mlee03 Dec 4, 2023

raphaeldussin commented Dec 12, 2023

thomas-robinson left a comment

nikizadehgfdl Dec 19, 2023

		character(len=512) :: prev_file_name !< name of netCDF data file for previous segment
		character(len=512) :: next_file_name !< name of netCDF data file for next segment

Reimplementation of bridge/multifile feature for data_override in main #1408

Reimplementation of bridge/multifile feature for data_override in main #1408

Conversation

nikizadehgfdl commented Nov 14, 2023 • edited Loading

nikizadehgfdl commented Nov 14, 2023

nikizadehgfdl commented Nov 14, 2023

raphaeldussin commented Nov 14, 2023 • edited Loading

uramirez8707 commented Nov 16, 2023

raphaeldussin commented Nov 16, 2023

bensonr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikizadehgfdl commented Nov 17, 2023

nikizadehgfdl commented Nov 21, 2023

raphaeldussin commented Nov 21, 2023

nikizadehgfdl commented Nov 27, 2023

uramirez8707 commented Nov 28, 2023

thomas-robinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomas-robinson commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphaeldussin commented Dec 12, 2023

thomas-robinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikizadehgfdl commented Nov 14, 2023 •

edited

Loading

raphaeldussin commented Nov 14, 2023 •

edited

Loading