Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NODEFAIL test on cheyenne. #1370

Merged
merged 3 commits into from
Apr 18, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion config/cesm/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,10 @@
<mpirun mpilib="default">
<executable>mpiexec_mpt</executable>
<arguments>
<arg name="anum_tasks"> -np $TOTALPES</arg>
<arg name="labelstdout">-p "%g:"</arg>
<arg name="threadplacement"> omplace </arg>
<!-- the omplace argument needs to be last -->
<arg name="zthreadplacement"> omplace </arg>
</arguments>
</mpirun>
<mpirun mpilib="mpi-serial">
Expand Down
42 changes: 41 additions & 1 deletion config/config_tests.xml
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,45 @@ PRE pause-resume test: by default a BFB test of pause-resume cycling

LII CLM initial condition interpolation test

======================================================================
Infrastructural tests for CIME. These are used by scripts_regression_tests.
Users won't generally run these.
======================================================================


TESTBUILDFAIL Insta-fail build step. Used to confirm that failed
builds are caught and reported correctly.

TESTBUILDFAILEXC Insta-fail build step by failing to init. Used to test
correct behavior when exceptions are generated.

TESTRUNFAIL Insta-fail run step. Used to confirm that model run
failures are caught and reported correctly.

TESTRUNFAILEXC Insta-fail run step via exception. Used to test correct
correct behavior when exceptions are generated.

TESTRUNPASS Insta-pass run step. Used to test that run that work
are reported correctly.

TESTMEMLEAKFAIL Insta-fail memleak step. Used to test that memleaks are
detected and reported correctly.

TESTMEMLEAKPASS Insta-pass memleak step. Used to test that non-memleaks are
reported correctly.

TESTRUNDIFF Produces a canned hist file. Env var TESTRUNDIFF_ALTERNATE can
be used to cause a DIFF. Used to check that baseline diffs are
detected and reported correctly.

TESTTESTDIFF Simulates internal test diff (non baseline). Used to check that
internal comparison failures are detected and reported correctly.

TESTRUNSLOWPASS After 5 minutes of sleep, pass run step. Used to test timeouts
and kills.

NODEFAIL Tests restart upon detected node failure. Generates fake failures,
the number of which is controlled by NODEFAIL_NUM_FAILS.

-->

Expand Down Expand Up @@ -366,7 +405,8 @@ LII CLM initial condition interpolation test
<test NAME="NODEFAIL">
<DESC>For testing infra only. Tests restart upon detected node failure</DESC>
<INFO_DBUG>1</INFO_DBUG>
<STOP_OPTION>ndays</STOP_OPTION>
<STOP_OPTION>nsteps</STOP_OPTION>
<OCN_NCPL>$ATM_NCPL</OCN_NCPL>
<STOP_N>11</STOP_N>
<REST_N>$STOP_N / 2 + 1</REST_N>
<REST_OPTION>$STOP_OPTION</REST_OPTION>
Expand Down
8 changes: 8 additions & 0 deletions scripts/lib/CIME/SystemTests/nodefail.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,16 @@ def _restart_fake_phase(self):
env_mach_specific.set_value("run_exe", fake_exe_file)
self._case.flush(flushall=True)

# This flag is needed by mpt to run a script under mpiexec
mpilib = self._case.get_value("MPILIB")
if mpilib == "mpt":
os.environ["MPI_SHEPHERD"] = "true"

self.run_indv(suffix=None)

if mpilib == "mpt":
del os.environ["MPI_SHEPHERD"]

env_mach_specific = self._case.get_env("mach_specific")
env_mach_specific.set_value("run_exe", prev_run_exe)
self._case.flush(flushall=True)
Expand Down