Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get unit test build and run working with serial or parallel pFUnit #1396

Merged
merged 19 commits into from
Apr 21, 2017
Merged

Get unit test build and run working with serial or parallel pFUnit #1396

merged 19 commits into from
Apr 21, 2017

Conversation

billsacks
Copy link
Member

The primary aim of this PR is to enable serial unit test builds as well
as MPI-based builds. A given machine/compiler can now have multiple
pFUnit installations; the correct installation is chosen based on the
options given to run_tests.py (--use-mpi and --use-openmp).

The default is now to use the serial build and run (i.e., without
mpi). This is a change from the earlier behavior, where the default was
to use mpi. I made this change because I was finding it challenging to
get the mpi-based unit tests working on new supercomputers (due to
restrictions on where you can build and where you can launch mpi jobs -
often those two locations are different; there were also challenges with
setting up the mpirun command for unit testing). Currently we aren't
leveraging the parallel capabilities of pFUnit in any of our tests, so
it seemed reasonable to just use the serial version. Note that this
builds mpi-serial as part of the unit test build: this is needed by
tests that use MCT.

In addition, this PR adds unit testing support on hobart-intel (NCAR/CGD
machine), so that scripts_regression_tests on that system will now run
the Fortran unit tests. I have removed references to PFUNIT_PATH on some
other systems where Sean Santos had installed pFUnit in his home directory.

This changes the unit testing user interface somewhat:

  • config_compilers should now give MPILIB and compile_threaded
    attributes for PFUNIT

  • The default mode of operation for run_tests.py is now serial - with
    neither mpi nor openMP. (This means that, on yellowstone, you no
    longer need to run the unit tests on caldera.)

  • Other components that have unit tests (CLM and CAM) will need to
    rework their top-level CMakeLists.txt file slightly. I can help with
    this. @ekluzek and @cacraigucar - take note.

Test suite: scripts_regression_tests on yellowstone and cheyenne
All pass except K_TestCimeCase, which was also failing on the version
of master on which this is based (6dd85dd)

Also ran scripts_regression_tests N_TestUnitTest on hobart
Test baseline: N/A
Test namelist changes: none
Test status: bit for bit

Fixes #1141 , partially addresses #869

User interface changes?: YES - see list above (under "This changes the
unit testing user interface somewhat")

Code review:

I think this method will work better once we want to select between a
parallel and serial compiler
The default is now to do a serial build, with no mpi or openmp
support. This can be changed via the --use-mpi and --use-openmp flags.

I have tested this on my mac. This required making the following change
in my config_compilers.xml:

-    <PFUNIT_PATH>/usr/local/pfunit/pfunit3.2.8-serial</PFUNIT_PATH>
+    <PFUNIT_PATH MPILIB="mpi-serial" compile_threaded="false">/usr/local/pfunit/pfunit3.2.8-serial</PFUNIT_PATH>
+    <PFUNIT_PATH MPILIB="mpich" compile_threaded="true">/usr/local/pfunit/pfunit-mpi</PFUNIT_PATH>

I could then build & run in serial mode with

tools/unit_testing/run_tests.py --build-dir `mktemp -d ./unit_tests.XXXXXXXX`

or in mpi mode with

tools/unit_testing/run_tests.py --use-mpi --use-openmp --build-dir `mktemp -d ./unit_tests.XXXXXXXX`
Now need to specify MPILIB and compile_threaded attributes.

Also remove references to old pfunit installations in Sean Santos's home
directory on various machines, because I don't know the characteristics
of these builds (whether they use mpi and/or openmp).
This is needed for the serial compilers on yellowstone
This seems unnecessary, and had a few problems:

(1) It assumed the use of MPICC and MPIF90, as opposed to SCC and SFC

(2) The timing of the include of CIME_utils comes too late for this
    setting to have any effect. If we move the include earlier in the
    top-level CMakeLists.txt file (before the project line), this causes
    problems, I think with the 'include(Compilers)' line.
This works on roo2 with both serial and mpi. But note that on roo2, I
set the full path to SFC and MPIFC. It also works to rerun unit tests
out of an existing directory (using either serial or mpi) - only the
bare minimum is done.

(I was worried that it might not work to rebuild, based on this comment
in the Makefile:

)
I'm hopeful that this will solve the problems with getting the compiler
variables from the Macros file on yellowstone and cheyenne.
I particularly want the netcdf fix on hobart
Links are now dying with an error that it can't find -lifcore
This avoids adding a semicolon

return value_transformed

def _element_needs_whitespace_removal(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this list? Wouldn't it be enough to just always remove trailing (and leading) white space?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that. I thought there might be some variables where there was deliberately some leading or trailing white space so that when they are concatenated together you get something like -foo -bar rather than -foo-bar. But I didn't look closely or experiment with this. If you think it's safe to always remove whitespace, then I can give it a shot on the four machines that now have unit testing support.

This isn't necessarily needed for variables other than the compiler, but
Jim Edwards suggested doing this to keep things simple and because there
*may* be other variables that need it. He thinks it shouldn't be a
problem to do this for all variables.
@jedwards4b jedwards4b merged commit 962b20b into ESMCI:master Apr 21, 2017
@billsacks billsacks deleted the compiler_from_macros branch April 21, 2017 15:36
jgfouca added a commit that referenced this pull request Jun 2, 2017
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
jgfouca added a commit that referenced this pull request Feb 23, 2018
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
jgfouca added a commit that referenced this pull request Mar 13, 2018
Update CIME to cime5.3

Update ACME to use CIME 5.3.x up to ESMCI hash 6156e0a from Thursday 4/13.

scripts_regression_tests and acme_developer tests pass on penn workstation.

This CIME change involves a large cosmetic change in CIME. Several directories have been renamed.

    The main interface to CIME, the scripts dir remains unchanged.
    cime/cime_config is renamed to cime/config
    The Fortran source code has been consolidated in src directory: externals, components, share, and drivers
    The driver_cpl directory has been moved to src/drivers/mct to prepare for multiple coupler options
    The share/csm_share/share directory has been renamed src/share/util.
    The cime script infrastructure source, utils/python/CIME, has moved to scripts/lib/CIME.

@singhbalwinder : please make sure PNNL cluster merge look correct
@mfdeakin-sandia : please look at cprnc/CMakeLists.txt and also confirm no run_acme changes.

Fixes #1311
Fixes #1380
Fixes #1382
Fixes #1383
Fixes #1396
Fixes #1402
Fixes #1416

[BFB]

* agsalin/update-to-cime5.3: (5649 commits)
  Fix titan problem
  Reset invalid pio_numiotasks
  Fix merge mistake
  Re-add eca testmod, was somehow lost in big merge
  Restore ACME version of shr_orb_cosz
  Fixed whitespace errors in Makefile killing COSP
  HOMME test typo fix
  Fixes to get tests to pass.
  Update ACME for new CIME directory structure
  More renaming.
  Rename ACME/cime directories in prep for CIME merge
  HOMME Improvement
  More edits to README
  Suggest a more portable usage of mktemp
  Tweak a comment
  Add comment on acme side
  Fix case for new unit_testing attribute to get_mpirun
  Allow run_tests.py to auto-determine the machine name
  fix an error
  make user-compset a seperate test
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants