Update Restart Test Validation #202

mattdturner · 2018-10-10T14:07:20Z

This PR modifies the restart test validation code to compare log file output in addition to restart files. It also will loop through all restart files for the final date-time group for binary I/O configurations.

Developer(s): Matt Turner
Please suggest code Pull Request reviewers in the column at right.
Are the code changes bit for bit, different at roundoff level, or more substantial? bfb
Please include the link to test results or paste the summary block from the bottom of the testing output below.
Does this PR create or have dependencies on Icepack or any other models? No
Is the documentation being updated with this PR? (Y/N) N
If not, does the documentation need to be updated separately at a later time? (Y/N) N

Note: "Documentation" includes information on the wiki and .rst files in doc/source/,
which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.

Other Relevant Details:
This addresses restart and run comparisons #164 and part of Test I/O options #136

The comparison of the logs only analyzes the output after writing the final restart file. For example, when writing NetCDF files the log includes:

  min, max, sum =  0.000000000000000E+000  0.995269294229302
   272.084043409881      aicen
  min, max, sum =  0.000000000000000E+000  0.986686529020635
   571.462133230432      aicen

When writing binary files, the log includes:

  write_global           14           0  0.000000000000000E+000
   1.00000000000000        303.121360132779
  write_global           14           0  0.000000000000000E+000
  0.303070626436538        219.239680892977

It then appends a PASS or FAIL to test_output with a tag of test_log. For example:

PASS conrad_intel_restart_gx3_4x1_iobinary test_log

The log comparison is currently hard-coded in the test_restart.script file. If we want to use it for comparing iobinary runs to netcdf output runs, we will likely need to modify it and/or move it to a new script.

…lso add a test to compare the diagnostic output in the log for the final restart file

…/o cases as well

mattdturner · 2018-10-10T15:48:18Z

If we want to use it for comparing iobinary runs to netcdf output runs, we will likely need to modify it and/or move it to a new script.

I should note that the output in the log is different between configurations that write NetCDF and Binary files. So comparing the log between netcdf and iobinary isn't so simple. If preferred, I could modify the script to look at the Arctic and Antarctic diagnostic output, such as:

total ice area  (km^2) =    1.88125052312735319E+07   2.21361886663511954E+07
total ice extent(km^2) =    1.95439796699420959E+07   2.41801799138016999E+07
total ice volume (m^3) =    3.16163133083423555E+13   5.72669786631575625E+13
total snw volume (m^3) =    2.47732766104872559E+12   3.65388470653937305E+12
tot kinetic energy (J) =    2.16798398922547000E+14   3.87853457181418562E+14

apcraig · 2018-10-10T15:54:04Z

This is great. Can you summarize what has been tested so far. Have you verified that test suites work fine still? Have you tested an iobinary case?

I don't think we need to worry about testing binary vs netcdf. That is likely to be problematic. The log file comparison should be adequate in that situation.

mattdturner · 2018-10-10T17:15:13Z

I've performed a restart test for both the default configuration (netcdf)

PASS conrad_intel_restart_gx3_4x1 build
PASS conrad_intel_restart_gx3_4x1 initialrun
PASS conrad_intel_restart_gx3_4x1 run -1 -1 -1
PASS conrad_intel_restart_gx3_4x1 test
PASS conrad_intel_restart_gx3_4x1 test_log

and the iobinary configuration.

PASS conrad_intel_restart_gx3_4x1_iobinary build
PASS conrad_intel_restart_gx3_4x1_iobinary initialrun
PASS conrad_intel_restart_gx3_4x1_iobinary run 15.39 1.44 8.14
PASS conrad_intel_restart_gx3_4x1_iobinary test
PASS conrad_intel_restart_gx3_4x1_iobinary test_log

I have also run the base_suite

123 of 123 tests PASSED
0 of 123 tests FAILED
0 of 123 tests PENDING

apcraig · 2018-10-10T18:32:15Z

I think there might be some conflicts with the report_results.csh script. That script is grepping for "test" in the results.log file and now there is a "test" and a "test_log" pair of results. That is going to confuse the script.

The other thing is that the log comparison is probably most useful when we don't have restart files. So, the log comparison should not assume there is restart data. I agree that we should instead look at the ice diagnostics in the log file to do the comparison. There are some separate challenges with doing that. First, the restart files can be bit-for-bit but the diagnostics not for different pe counts or decomps due to round off differences with sums in the log diagnostics. But lets ignore that for now, I think we know how to deal with that.

So, what I think we should do is build a log comparison that is triggered if the restart files don't exist or even if we have a set of netcdf restart and a set of binary restarts. That log comparison should compare the ice diagnostics. And then either the netcdf restarts, binary restarts, or log comparison will be used to generate the "test" result in restart testing.

We also probably need the same approach implemented for the case comparison mechanism when we compare one case to another. I recommend a "compare" script be created that can be reused for both the restart test validation and the case compare mode, if that makes sense. I guess that might look like "comparebfb dir1 dir2" with a return string of pass, fail, and maybe other things. The comparebfb looks at dir1 and dir2 and decides whether to do a netcdf, binary, or log file comparison. I don't know if that will work in practice, but lets build in reuse if we can.

eclare108213 · 2018-10-10T18:47:01Z

Just FYI:
Sometimes the log files are identical but comparison of restarts shows that the runs are not BFB. This can happen when the log file printing is minimal (mitigation: set print_global and print_points both to "true"), and sometimes when the code does calculations that aren't reflected in the diagnostic output. That happens less now that the diagnostics have grown to be fairly extensive, but we could still be missing some things. Therefore I recommend always comparing restart files if we have them in the same format, and rely on diagnostics only when we must.

apcraig · 2018-10-10T19:01:04Z

@eclare108213, I agree. If restarts exist, compare them. If not, revert to the log files. Neither are completely adequate to check full bit-for-bit of all model variables, especially if the log file does not contain much info. We can have a check that some minimum output data is in the log file for it to pass. We'll deal with the global sum bfb issue in the log files separately.

mattdturner · 2018-10-10T19:06:52Z

I think there might be some conflicts with the report_results.csh script. That script is grepping for "test" in the results.log file and now there is a "test" and a "test_log" pair of results. That is going to confuse the script.

I'll replace test_log with log

I agree that we should instead look at the ice diagnostics in the log file to do the comparison.

I'll update the script to look at the diagnostic output. I've been using the date-time group from the final restart file to determine which portions of the log to look at, so I'll have to change my approach in order to handle cases where there are no restart files.

I'll try implementing the comparebfb script that you suggested to automatically determine what type of comparison to perform.

…file comparison (bit-for-bit)

… directory

mattdturner · 2018-10-11T13:20:04Z

I've created a new comparebfb.csh script, which test_restart.script calls. I have tested an iobinary case, and the full base_suite. The iobinary tests passed, and the base_suite reported:

119 of 123 tests PASSED
4 of 123 tests FAILED
0 of 123 tests PENDING

The 4 failing tests are all log comparisons. For example,

PASS conrad_intel_restart_gx3_4x2_alt03 build
PASS conrad_intel_restart_gx3_4x2_alt03 initialrun
PASS conrad_intel_restart_gx3_4x2_alt03 run 38.97 22.85 7.92
PASS conrad_intel_restart_gx3_4x2_alt03 test 
FAIL conrad_intel_restart_gx3_4x2_alt03 log

If I manually compare the 2 log files, there are differences in arwt incoming sw and arwt swdn error. In the Arctic.

                                Initial Run                    Restart Run
arwt incoming sw (W) =    1.04180217683051750E+14        1.20933553675854437E+14
arwt swdn error      =   -2.24574380963046449E+00       -2.76769546032417635E+00

Should I change the test_restart.script file to not run the log comparison so that the report_results.csh script doesn't report failures?

I still need to update the baseline.script file to use the new comparebfb.csh script. I can add logic in baseline.script or comparebfb.csh to perform a log comparison ONLY if there are no restart files or we are comparing a netcdf run to a binary I/O run. Any preferences?

apcraig · 2018-10-11T16:03:02Z

Thanks @mattdturner. I think the implementation should be that the log files are compared only if restarts are not compared.

It looks like the comparebfb script is only used for the restart test now, not yet for the compare option between different cases. It would be good to support both, but I guess that could be deferred.

With regard to the different results for the initial and log file, that is something we need to look into more. For the time being, allow the log comparison to fail if this diff is encountered. We need to decide whether that variable should be excluded or whether that diagnostic needs to be updated.

Finally, it's worth adding some sort of check to the log compare so it can pass only if a reasonable set of diagnostics are being compared. Also, can you explain what is being compared in the log files? What is this doing?

set base_out = tac $base_log | awk 'BEGIN{found=1;} /istep1:/ {found=0} {if (found) print}' | tac | grep '= ' | grep -v 'min, max, sum'

Is it comparing everything after the first istep1 string except min,max,sum? I don't see how that could work for the restart files as the amount of info is different in the two log files.

mattdturner · 2018-10-11T16:17:21Z

I am in the process of modifying the compare option between 2 cases to use the comparebfb.csh script. The script should already be configured to work for the compare option, I just need to test it.

I'll modify the scripting to only compare the log files if restart files are not compared.

set base_out = tac $base_log | awk 'BEGIN{found=1;} /istep1:/ {found=0} {if (found) print}' | tac | grep '= ' | grep -v 'min, max, sum'

What this line does is print out every diagnostic output (i.e., total ice area, total ice extent, etc.) in the log file after the final istep1. I set it up to only look at the final diagnostic output since the 2 logs will have different number of timesteps. The logic in the line of code is a bit complex, but it was the simplest method I was able to come up with to compare the diagnostic output from only the final timestep.

apcraig · 2018-10-11T16:21:59Z

Sounds good Matt. Comparing the last timestep makes sense. And if there is not timestep output, the test should fail.

Making the compare script work for restarts and for compare cases could be a little tricky. In one case, you have two files (or more for binary) to compare in the same directory with different filenames. In the other, you are comparing files, possibly of the same name in two different directories. The comparisons are the same, it's really a matter of keeping track of the dirs/files, especially when you are trying to decide if there are netcdf or binary files there. If it becomes too complicated, we can defer, but I think its worth trying to do. thanks again!

… comparing log files, and a difference is encountered, the difference is printed to the log.

…bfb.csh

mattdturner · 2018-10-12T16:05:24Z

Ok, I just updated the scripts. I modified the log comparison slightly to print differences to the log file. I also modified the BFBCOMP code in baseline.script to call the comparebfb.csh script. Right now, the BFBCOMP code just passes the 2 files to compare to the comparebfb.csh script. In the future, we might want additional logic to determine whether to perform a file comparison or a log comparison.

A base_suite run confirms that all tests are passing.

108 of 108 tests PASSED
0 of 108 tests FAILED
0 of 108 tests PENDING

apcraig · 2018-10-13T12:51:01Z

I think this looks pretty good. It looks like you can pass in 1 directory, 2 directories, or 2 files. With 1 directory, it assumes it's a restart comparison. With 2 directories, it checks for binary, log, or netcdf comparison. With 2 files, it looks like it's assuming they are both netcdf files. Is that correct? That seems reasonable at this point and looks like a clean implementation with lots of reuse which is great.

@mattdturner, do you think I should try this myself before approving or are you fairly confident we're OK?

mattdturner · 2018-10-15T13:03:40Z

With 1 directory, it assumes it's a restart comparison. With 2 directories, it checks for binary, log, or netcdf comparison. With 2 files, it looks like it's assuming they are both netcdf files.

That is correct. After testing the base_suite and having no issues, I'm fairly confident that we are OK.

apcraig

I think we can merge this, but won't in case we have some other commits we're trying to stage first. Will try to get this merged by the end of the week.

turner added 2 commits October 9, 2018 20:50

Update the restart test script to also handle binary restart files. A…

179e310

…lso add a test to compare the diagnostic output in the log for the final restart file

Modify the log comparison test for restart cases to work for binary i…

deb516f

…/o cases as well

turner added 3 commits October 10, 2018 20:50

Add new comparebfb.csh script that performs the restart file and log …

c1cc82a

…file comparison (bit-for-bit)

Update the restart test script to use the new comparebfb.csh script

be5bcf3

Have cice.setup copy the new comparebfb.csh script to the casescripts…

f9193f3

… directory

turner added 4 commits October 12, 2018 13:46

Update comparebfb.csh script to allow specific files to be passed. If…

75f6304

… comparing log files, and a difference is encountered, the difference is printed to the log.

Remove the log comparison from the restart test script

57e467d

Modify the BFBCOMP code to now call the comparebfb.csh script

e20d3c9

Add a brief workaround to handle asterisks in the log file in compare…

2c93225

…bfb.csh

apcraig approved these changes Oct 18, 2018

View reviewed changes

apcraig merged commit 2c1379e into CICE-Consortium:master Oct 19, 2018

apcraig mentioned this pull request Nov 6, 2018

restart and run comparisons #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Restart Test Validation #202

Update Restart Test Validation #202

mattdturner commented Oct 10, 2018

mattdturner commented Oct 10, 2018

apcraig commented Oct 10, 2018

mattdturner commented Oct 10, 2018

apcraig commented Oct 10, 2018

eclare108213 commented Oct 10, 2018

apcraig commented Oct 10, 2018

mattdturner commented Oct 10, 2018

mattdturner commented Oct 11, 2018

apcraig commented Oct 11, 2018

mattdturner commented Oct 11, 2018

apcraig commented Oct 11, 2018

mattdturner commented Oct 12, 2018

apcraig commented Oct 13, 2018

mattdturner commented Oct 15, 2018

apcraig left a comment

Update Restart Test Validation #202

Update Restart Test Validation #202

Conversation

mattdturner commented Oct 10, 2018

mattdturner commented Oct 10, 2018

apcraig commented Oct 10, 2018

mattdturner commented Oct 10, 2018

apcraig commented Oct 10, 2018

eclare108213 commented Oct 10, 2018

apcraig commented Oct 10, 2018

mattdturner commented Oct 10, 2018

mattdturner commented Oct 11, 2018

apcraig commented Oct 11, 2018

mattdturner commented Oct 11, 2018

apcraig commented Oct 11, 2018

mattdturner commented Oct 12, 2018

apcraig commented Oct 13, 2018

mattdturner commented Oct 15, 2018

apcraig left a comment

Choose a reason for hiding this comment