Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chgres_cube: Create new regression tests to further test regional functionality #181

Closed
LarissaReames-NOAA opened this issue Oct 22, 2020 · 12 comments
Assignees
Labels

Comments

@LarissaReames-NOAA
Copy link
Collaborator

Planned regression tests can be seen here

@GeorgeGayno-NOAA
Copy link
Collaborator

Currently, the individual chgres_cube regression tests are run in sequence. This can take over 20 minutes. As the number of tests increases, I recommend that they be run in parallel, with the summary log being created after the last test completes.

@LarissaReames-NOAA
Copy link
Collaborator Author

Currently, the individual chgres_cube regression tests are run in sequence. This can take over 20 minutes. As the number of tests increases, I recommend that they be run in parallel, with the summary log being created after the last test completes.

@GeorgeGayno-NOAA I currently have our new suite of regression tests running on Hera, with this parallel execution working well. The whole thing takes about 5 minutes now. @JeffBeck-NOAA has agreed to help me test additional regression test drivers on Jet and Orion. However, neither of us has access to WCOSS to test the cray and dell driver scripts. Would you be willing to take the files that we create for the systems we have access too, and engineer similar changes to the cray and dell drivers?

@GeorgeGayno-NOAA
Copy link
Collaborator

Currently, the individual chgres_cube regression tests are run in sequence. This can take over 20 minutes. As the number of tests increases, I recommend that they be run in parallel, with the summary log being created after the last test completes.

@GeorgeGayno-NOAA I currently have our new suite of regression tests running on Hera, with this parallel execution working well. The whole thing takes about 5 minutes now. @JeffBeck-NOAA has agreed to help me test additional regression test drivers on Jet and Orion. However, neither of us has access to WCOSS to test the cray and dell driver scripts. Would you be willing to take the files that we create for the systems we have access too, and engineer similar changes to the cray and dell drivers?

Yes, I can test on Cray and Dell.

@LarissaReames-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA in some testing I'm doing for the new regression tests, some of the previous regression tests are now failing, despite the executable being from the 19th and the regression tests all passing before my last commit with this same executable. The issue looks to be the snow depth and equivalent snow depth fields in the sfc file. The baseline files look to have changed on the 19th for some cases. Any idea why this might be happening?

@GeorgeGayno-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA in some testing I'm doing for the new regression tests, some of the previous regression tests are now failing, despite the executable being from the 19th and the regression tests all passing before my last commit with this same executable. The issue looks to be the snow depth and equivalent snow depth fields in the sfc file. The baseline files look to have changed on the 19th for some cases. Any idea why this might be happening?

I updated the baseline data after the last commit to develop. If your branch is up-to-date with develop, it should pass. How big are the differences? If they are small, then I consider that to be a 'pass'. Check the regression.log file and search for 'nccmp'. It will list the differences.

@LarissaReames-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA in some testing I'm doing for the new regression tests, some of the previous regression tests are now failing, despite the executable being from the 19th and the regression tests all passing before my last commit with this same executable. The issue looks to be the snow depth and equivalent snow depth fields in the sfc file. The baseline files look to have changed on the 19th for some cases. Any idea why this might be happening?

I updated the baseline data after the last commit to develop. If your branch is up-to-date with develop, it should pass. How big are the differences? If they are small, then I consider that to be a 'pass'. Check the regression.log file and search for 'nccmp'. It will list the differences.

They are very small: ~1E-14. I'm just wondering why these are showing up now but weren't before. These are OTOO the errors we see when difference compute node types are used on Jet, or files are copied between systems, but it's only these two fields, which seems really odd.

@LarissaReames-NOAA
Copy link
Collaborator Author

LarissaReames-NOAA commented Nov 30, 2020

@GeorgeGayno-NOAA in some testing I'm doing for the new regression tests, some of the previous regression tests are now failing, despite the executable being from the 19th and the regression tests all passing before my last commit with this same executable. The issue looks to be the snow depth and equivalent snow depth fields in the sfc file. The baseline files look to have changed on the 19th for some cases. Any idea why this might be happening?

I updated the baseline data after the last commit to develop. If your branch is up-to-date with develop, it should pass. How big are the differences? If they are small, then I consider that to be a 'pass'. Check the regression.log file and search for 'nccmp'. It will list the differences.

They are very small: ~1E-14. I'm just wondering why these are showing up now but weren't before. These are OTOO the errors we see when difference compute node types are used on Jet, or files are copied between systems, but it's only these two fields, which seems really odd.

Also, it's only tile 6, and only a few grid points. Not usually the same number of grid points for each field. In some cases t2m and q2m are affected as well. Note that the regional test passes, but none of the global tests pass. All of my new regression tests are unaffected.

UPDATE: I've tested on Hera and the tests fail with the exact same differences. Down to the # of grid points and the values of the differences. I understand that they're very minor differences, but this not field-wide noise. These are isolated differences that I'd like to understand. The only commits to the develop branch in the past few weeks have been from my PRs, so I'm not certain at all how these differences are occurring.

@LarissaReames-NOAA
Copy link
Collaborator Author

@GeorgeGayno-NOAA The same deviations from the baseline tests are showing up when I run the regression tests with the new develop branch on both Jet and Hera. While I understand that these differences are insignificantly small, I feel like we should be aiming for all tests to receive a "PASSED" in the regression test format. I don't know what the original source of the differences is as it's only showing up in a few grid cells on tile 6 in global tests. Should the baselines be recreated with the newest develop branch so that the tests receive a "PASSED"?

@edwardhartnett
Copy link
Collaborator

We need to get these regression tests as part of the build, so that we can all see this happening...

@GeorgeGayno-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA The same deviations from the baseline tests are showing up when I run the regression tests with the new develop branch on both Jet and Hera. While I understand that these differences are insignificantly small, I feel like we should be aiming for all tests to receive a "PASSED" in the regression test format. I don't know what the original source of the differences is as it's only showing up in a few grid cells on tile 6 in global tests. Should the baselines be recreated with the newest develop branch so that the tests receive a "PASSED"?

I just ran the regression tests on Hera using 005f9a0 of 'develop'. All tests passed for me. So I am not sure how I can explain your result.

For a future task, the regression tests should be modified so any insignificant differences are ignored. But I am not sure how to define 'insignificant'. That is a question for the software engineers. The 'nccmp' utility has an option to ignore differences under a user-specified threshold. But I am not sure of what that threshold should be.

@LarissaReames-NOAA
Copy link
Collaborator Author

We need to get these regression tests as part of the build, so that we can all see this happening...

These differences are only happening for the global regression tests that are already in develop. All of the new regression tests I'm planning to put in a PR soon are passing, as is the regional regression test already in develop.

@LarissaReames-NOAA
Copy link
Collaborator Author

I believe I've tracked this issue down to module differences. I've addressed this in Issue #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants