Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various cluster issues #1807

Merged
merged 8 commits into from
Apr 14, 2022
1 change: 1 addition & 0 deletions doc/user_manual/runInfo.tex
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,7 @@ \subsection{RunInfo: Advanced Users}
\item[\%BASE\_WORKING\_DIR\%] Expands to the base working directory given in RunInfo. This will likely be a parent of WORKING\_DIR
\item[\%METHOD\%] Expands to the environmental variable \$METHOD
\item[\%NUM\_CPUS\%] Expands to the number of cpus to use per single batch. This is NumThreads in the XML file.
\item[\%PYTHON\%] Expands to the python that is used to run RAVEN.

\end{description}

Expand Down
3 changes: 2 additions & 1 deletion ravenframework/CustomModes/MPILegacySimulationMode.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ def createAndRunQSUB(runInfoDict):
"-l","walltime="+runInfoDict["expectedTime"],
"-l","place=free","-v",
'COMMAND="../raven_framework '+
" ".join(runInfoDict["SimulationFiles"])+'"',
" ".join(runInfoDict["SimulationFiles"])+'",'+
'RAVEN_FRAMEWORK_DIR="{}"'.format(frameworkDir),
runInfoDict['RemoteRunCommand']]
#Change to frameworkDir so we find raven_qsub_command.sh
remoteRunCommand = {}
Expand Down
1 change: 1 addition & 0 deletions ravenframework/Models/Code.py
Original file line number Diff line number Diff line change
Expand Up @@ -564,6 +564,7 @@ def evaluateSample(self, myInput, samplerType, kwargs):
command = command.replace("%BASE_WORKING_DIR%",kwargs['BASE_WORKING_DIR'])
command = command.replace("%METHOD%",kwargs['METHOD'])
command = command.replace("%NUM_CPUS%",kwargs['NUM_CPUS'])
command = command.replace("%PYTHON%", sys.executable)

self.raiseAMessage('Execution command submitted:',command)
if platform.system() == 'Windows':
Expand Down
14 changes: 12 additions & 2 deletions ravenframework/Runners/DistributedMemoryRunner.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"""
Created on Mar 5, 2013

@author: alfoa, cogljj, crisr
@author: alfoa, cogljj, crisr, talbpw, maljdp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you blame me for this 😁

"""
#External Modules------------------------------------------------------------------------------------
import sys
Expand Down Expand Up @@ -69,7 +69,17 @@ def isDone(self):
if self.thread is None:
return True
else:
return (self.thread in ray.wait([self.thread], timeout=waitTimeOut)[0]) if im.isLibAvail("ray") else self.thread.finished
if im.isLibAvail("ray"):
try:
ray.get(self.thread, timeout=waitTimeOut)
return True
except ray.exceptions.GetTimeoutError:
return False
#Alternative that was tried:
#return self.thread in ray.wait([self.thread], timeout=waitTimeOut)[0]
#which ran slower in ray 1.9
else:
return self.thread.finished

def _collectRunnerResponse(self):
"""
Expand Down
9 changes: 8 additions & 1 deletion ravenframework/raven_ec_qsub_command.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,14 @@ fi

source /etc/profile.d/modules.sh
echo RAVEN_FRAMEWORK_DIR $RAVEN_FRAMEWORK_DIR
source $RAVEN_FRAMEWORK_DIR/../scripts/establish_conda_env.sh --load

if test -e $RAVEN_FRAMEWORK_DIR/../scripts/establish_conda_env.sh; then
source $RAVEN_FRAMEWORK_DIR/../scripts/establish_conda_env.sh --load
else
echo RAVEN_FRAMEWORK_DIR ERROR
echo FILE $RAVEN_FRAMEWORK_DIR/../scripts/establish_conda_env.sh
echo NOT FOUND
fi
module load pbs openmpi

which python
Expand Down
4 changes: 3 additions & 1 deletion tests/cluster_tests/RavenRunsRaven/Code/Inner/Simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import sys
import argparse
import configparser
import time

def checkAux():
"""
Expand Down Expand Up @@ -70,7 +71,7 @@ def write(a, b, c, x, y, out):
@ In, y, float, float
@ In, out, string, filename to write results to
"""
print('Writing to', out)
print('Writing to', out, time.ctime())
with open(out, 'w') as f:
f.writelines(','.join('abcxy') + '\n')
f.writelines(','.join(str(i) for i in [a, b, c, x, y]) + '\n')
Expand All @@ -82,3 +83,4 @@ def write(a, b, c, x, y, out):
a, b, x, y, out = readInput(infileName)
c = run(a, b, x, y)
write(a, b, c, x, y, out)
print("Goodbye", time.ctime())
4 changes: 2 additions & 2 deletions tests/cluster_tests/RavenRunsRaven/Code/inner.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<?xml version="1.0" ?>
<Simulation verbosity="debug">
<Simulation verbosity="debug" profile="jobs">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TestInfo need to be added, and the revisions node need to be updated to reflect the python command change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this isn't a test file, it's a file run by a test file. When we first added this test we had a discussion about it, and decided that the "inner" of the RrR tests should not be considered the test file; rather the "outer" should. I think the philosophical idea was that the "outer" is actually the test, while the "inner" doesn't get seen by the testing harness, and it would be confusing if this data was loaded as if it were a separate test into the regression test documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me.


<RunInfo>
<WorkingDir>Inner</WorkingDir>
Expand Down Expand Up @@ -47,7 +47,7 @@
<Models>
<Code name="simple" subType="GenericCode">
<executable>Simple.py</executable>
<clargs arg="python" type="prepend"/>
<clargs arg="%PYTHON%" type="prepend"/>
<clargs arg="-i" extension=".inp" type="input"/>
<fileargs arg="output" type="output"/>
</Code>
Expand Down
2 changes: 1 addition & 1 deletion tests/cluster_tests/RavenRunsRaven/code.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<?xml version="1.0" ?>
<Simulation verbosity="debug">
<Simulation verbosity="debug" profile="jobs">
<TestInfo>
<name>cluster_tests/RavenRunsRaven.Code</name>
<author>talbpaul</author>
Expand Down
2 changes: 1 addition & 1 deletion tests/cluster_tests/RavenRunsRaven/rom.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<?xml version="1.0" ?>
<Simulation verbosity="debug">
<Simulation verbosity="debug" profile="jobs">
<TestInfo>
<name>framework/cluster_tests/RavenRunsRaven.ROM</name>
<author>alfoa</author>
Expand Down