Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Restart Test Validation #202

Merged
merged 9 commits into from
Oct 19, 2018
Merged

Conversation

mattdturner
Copy link
Contributor

This PR modifies the restart test validation code to compare log file output in addition to restart files. It also will loop through all restart files for the final date-time group for binary I/O configurations.

  • Developer(s): Matt Turner

  • Please suggest code Pull Request reviewers in the column at right.

  • Are the code changes bit for bit, different at roundoff level, or more substantial? bfb

  • Please include the link to test results or paste the summary block from the bottom of the testing output below.

  • Does this PR create or have dependencies on Icepack or any other models? No

  • Is the documentation being updated with this PR? (Y/N) N
    If not, does the documentation need to be updated separately at a later time? (Y/N) N

Note: "Documentation" includes information on the wiki and .rst files in doc/source/,
which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.

The comparison of the logs only analyzes the output after writing the final restart file. For example, when writing NetCDF files the log includes:

  min, max, sum =  0.000000000000000E+000  0.995269294229302
   272.084043409881      aicen
  min, max, sum =  0.000000000000000E+000  0.986686529020635
   571.462133230432      aicen

When writing binary files, the log includes:

  write_global           14           0  0.000000000000000E+000
   1.00000000000000        303.121360132779
  write_global           14           0  0.000000000000000E+000
  0.303070626436538        219.239680892977

It then appends a PASS or FAIL to test_output with a tag of test_log. For example:

PASS conrad_intel_restart_gx3_4x1_iobinary test_log

The log comparison is currently hard-coded in the test_restart.script file. If we want to use it for comparing iobinary runs to netcdf output runs, we will likely need to modify it and/or move it to a new script.

turner added 2 commits October 9, 2018 20:50
…lso add a test to compare the diagnostic output in the log for the final restart file
@mattdturner
Copy link
Contributor Author

If we want to use it for comparing iobinary runs to netcdf output runs, we will likely need to modify it and/or move it to a new script.

I should note that the output in the log is different between configurations that write NetCDF and Binary files. So comparing the log between netcdf and iobinary isn't so simple. If preferred, I could modify the script to look at the Arctic and Antarctic diagnostic output, such as:

total ice area  (km^2) =    1.88125052312735319E+07   2.21361886663511954E+07
total ice extent(km^2) =    1.95439796699420959E+07   2.41801799138016999E+07
total ice volume (m^3) =    3.16163133083423555E+13   5.72669786631575625E+13
total snw volume (m^3) =    2.47732766104872559E+12   3.65388470653937305E+12
tot kinetic energy (J) =    2.16798398922547000E+14   3.87853457181418562E+14

@apcraig
Copy link
Contributor

apcraig commented Oct 10, 2018

This is great. Can you summarize what has been tested so far. Have you verified that test suites work fine still? Have you tested an iobinary case?

I don't think we need to worry about testing binary vs netcdf. That is likely to be problematic. The log file comparison should be adequate in that situation.

@mattdturner
Copy link
Contributor Author

I've performed a restart test for both the default configuration (netcdf)

PASS conrad_intel_restart_gx3_4x1 build
PASS conrad_intel_restart_gx3_4x1 initialrun
PASS conrad_intel_restart_gx3_4x1 run -1 -1 -1
PASS conrad_intel_restart_gx3_4x1 test
PASS conrad_intel_restart_gx3_4x1 test_log

and the iobinary configuration.

PASS conrad_intel_restart_gx3_4x1_iobinary build
PASS conrad_intel_restart_gx3_4x1_iobinary initialrun
PASS conrad_intel_restart_gx3_4x1_iobinary run 15.39 1.44 8.14
PASS conrad_intel_restart_gx3_4x1_iobinary test
PASS conrad_intel_restart_gx3_4x1_iobinary test_log

I have also run the base_suite

123 of 123 tests PASSED
0 of 123 tests FAILED
0 of 123 tests PENDING

@apcraig
Copy link
Contributor

apcraig commented Oct 10, 2018

I think there might be some conflicts with the report_results.csh script. That script is grepping for "test" in the results.log file and now there is a "test" and a "test_log" pair of results. That is going to confuse the script.

The other thing is that the log comparison is probably most useful when we don't have restart files. So, the log comparison should not assume there is restart data. I agree that we should instead look at the ice diagnostics in the log file to do the comparison. There are some separate challenges with doing that. First, the restart files can be bit-for-bit but the diagnostics not for different pe counts or decomps due to round off differences with sums in the log diagnostics. But lets ignore that for now, I think we know how to deal with that.

So, what I think we should do is build a log comparison that is triggered if the restart files don't exist or even if we have a set of netcdf restart and a set of binary restarts. That log comparison should compare the ice diagnostics. And then either the netcdf restarts, binary restarts, or log comparison will be used to generate the "test" result in restart testing.

We also probably need the same approach implemented for the case comparison mechanism when we compare one case to another. I recommend a "compare" script be created that can be reused for both the restart test validation and the case compare mode, if that makes sense. I guess that might look like "comparebfb dir1 dir2" with a return string of pass, fail, and maybe other things. The comparebfb looks at dir1 and dir2 and decides whether to do a netcdf, binary, or log file comparison. I don't know if that will work in practice, but lets build in reuse if we can.

@eclare108213
Copy link
Contributor

Just FYI:
Sometimes the log files are identical but comparison of restarts shows that the runs are not BFB. This can happen when the log file printing is minimal (mitigation: set print_global and print_points both to "true"), and sometimes when the code does calculations that aren't reflected in the diagnostic output. That happens less now that the diagnostics have grown to be fairly extensive, but we could still be missing some things. Therefore I recommend always comparing restart files if we have them in the same format, and rely on diagnostics only when we must.

@apcraig
Copy link
Contributor

apcraig commented Oct 10, 2018

@eclare108213, I agree. If restarts exist, compare them. If not, revert to the log files. Neither are completely adequate to check full bit-for-bit of all model variables, especially if the log file does not contain much info. We can have a check that some minimum output data is in the log file for it to pass. We'll deal with the global sum bfb issue in the log files separately.

@mattdturner
Copy link
Contributor Author

I think there might be some conflicts with the report_results.csh script. That script is grepping for "test" in the results.log file and now there is a "test" and a "test_log" pair of results. That is going to confuse the script.

I'll replace test_log with log

I agree that we should instead look at the ice diagnostics in the log file to do the comparison.

I'll update the script to look at the diagnostic output. I've been using the date-time group from the final restart file to determine which portions of the log to look at, so I'll have to change my approach in order to handle cases where there are no restart files.

I'll try implementing the comparebfb script that you suggested to automatically determine what type of comparison to perform.

@mattdturner
Copy link
Contributor Author

I've created a new comparebfb.csh script, which test_restart.script calls. I have tested an iobinary case, and the full base_suite. The iobinary tests passed, and the base_suite reported:

119 of 123 tests PASSED
4 of 123 tests FAILED
0 of 123 tests PENDING

The 4 failing tests are all log comparisons. For example,

PASS conrad_intel_restart_gx3_4x2_alt03 build
PASS conrad_intel_restart_gx3_4x2_alt03 initialrun
PASS conrad_intel_restart_gx3_4x2_alt03 run 38.97 22.85 7.92
PASS conrad_intel_restart_gx3_4x2_alt03 test 
FAIL conrad_intel_restart_gx3_4x2_alt03 log 

If I manually compare the 2 log files, there are differences in arwt incoming sw and arwt swdn error. In the Arctic.

                                Initial Run                    Restart Run
arwt incoming sw (W) =    1.04180217683051750E+14        1.20933553675854437E+14
arwt swdn error      =   -2.24574380963046449E+00       -2.76769546032417635E+00

Should I change the test_restart.script file to not run the log comparison so that the report_results.csh script doesn't report failures?

I still need to update the baseline.script file to use the new comparebfb.csh script. I can add logic in baseline.script or comparebfb.csh to perform a log comparison ONLY if there are no restart files or we are comparing a netcdf run to a binary I/O run. Any preferences?

@apcraig
Copy link
Contributor

apcraig commented Oct 11, 2018

Thanks @mattdturner. I think the implementation should be that the log files are compared only if restarts are not compared.

It looks like the comparebfb script is only used for the restart test now, not yet for the compare option between different cases. It would be good to support both, but I guess that could be deferred.

With regard to the different results for the initial and log file, that is something we need to look into more. For the time being, allow the log comparison to fail if this diff is encountered. We need to decide whether that variable should be excluded or whether that diagnostic needs to be updated.

Finally, it's worth adding some sort of check to the log compare so it can pass only if a reasonable set of diagnostics are being compared. Also, can you explain what is being compared in the log files? What is this doing?

set base_out = tac $base_log | awk 'BEGIN{found=1;} /istep1:/ {found=0} {if (found) print}' | tac | grep '= ' | grep -v 'min, max, sum'

Is it comparing everything after the first istep1 string except min,max,sum? I don't see how that could work for the restart files as the amount of info is different in the two log files.

@mattdturner
Copy link
Contributor Author

I am in the process of modifying the compare option between 2 cases to use the comparebfb.csh script. The script should already be configured to work for the compare option, I just need to test it.

I'll modify the scripting to only compare the log files if restart files are not compared.

set base_out = tac $base_log | awk 'BEGIN{found=1;} /istep1:/ {found=0} {if (found) print}' | tac | grep '= ' | grep -v 'min, max, sum'

What this line does is print out every diagnostic output (i.e., total ice area, total ice extent, etc.) in the log file after the final istep1. I set it up to only look at the final diagnostic output since the 2 logs will have different number of timesteps. The logic in the line of code is a bit complex, but it was the simplest method I was able to come up with to compare the diagnostic output from only the final timestep.

@apcraig
Copy link
Contributor

apcraig commented Oct 11, 2018

Sounds good Matt. Comparing the last timestep makes sense. And if there is not timestep output, the test should fail.

Making the compare script work for restarts and for compare cases could be a little tricky. In one case, you have two files (or more for binary) to compare in the same directory with different filenames. In the other, you are comparing files, possibly of the same name in two different directories. The comparisons are the same, it's really a matter of keeping track of the dirs/files, especially when you are trying to decide if there are netcdf or binary files there. If it becomes too complicated, we can defer, but I think its worth trying to do. thanks again!

@mattdturner
Copy link
Contributor Author

Ok, I just updated the scripts. I modified the log comparison slightly to print differences to the log file. I also modified the BFBCOMP code in baseline.script to call the comparebfb.csh script. Right now, the BFBCOMP code just passes the 2 files to compare to the comparebfb.csh script. In the future, we might want additional logic to determine whether to perform a file comparison or a log comparison.

A base_suite run confirms that all tests are passing.

108 of 108 tests PASSED
0 of 108 tests FAILED
0 of 108 tests PENDING

@apcraig
Copy link
Contributor

apcraig commented Oct 13, 2018

I think this looks pretty good. It looks like you can pass in 1 directory, 2 directories, or 2 files. With 1 directory, it assumes it's a restart comparison. With 2 directories, it checks for binary, log, or netcdf comparison. With 2 files, it looks like it's assuming they are both netcdf files. Is that correct? That seems reasonable at this point and looks like a clean implementation with lots of reuse which is great.

@mattdturner, do you think I should try this myself before approving or are you fairly confident we're OK?

@mattdturner
Copy link
Contributor Author

With 1 directory, it assumes it's a restart comparison. With 2 directories, it checks for binary, log, or netcdf comparison. With 2 files, it looks like it's assuming they are both netcdf files.

That is correct. After testing the base_suite and having no issues, I'm fairly confident that we are OK.

Copy link
Contributor

@apcraig apcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this, but won't in case we have some other commits we're trying to stage first. Will try to get this merged by the end of the week.

@apcraig apcraig merged commit 2c1379e into CICE-Consortium:master Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants