Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raven runs Raven on a cluster #684

Merged
merged 36 commits into from
Jul 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c186c3b
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 6, 2017
d55e7ba
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 17, 2017
48b00ff
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 18, 2017
80392fa
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 20, 2017
cb7910c
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 24, 2017
580029f
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 24, 2017
44d397f
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 26, 2017
eee2060
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL Apr 27, 2017
c1e615d
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 3, 2017
e062ea2
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 8, 2017
a14e9ab
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 9, 2017
9784e2b
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 10, 2017
3c2e833
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 11, 2017
eb95597
user guide for single value ravenoutput plots generated
PaulTalbot-INL May 11, 2017
98c92b1
associated guide docs
PaulTalbot-INL May 11, 2017
4e8f2ad
Merge branch 'devel' of github.com:idaholab/raven into devel
PaulTalbot-INL May 15, 2017
5addb46
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Jun 29, 2017
f4d6df9
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Jul 6, 2017
f57beeb
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Jul 31, 2017
2bff8f5
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Aug 22, 2017
ec7bd0e
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Aug 25, 2017
7efb28f
mergefix
taoyiliang Oct 2, 2017
9b3745d
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Nov 13, 2017
e14b48b
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Dec 20, 2017
21d0929
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
taoyiliang Feb 12, 2018
5df10c2
Merge remote-tracking branch 'mainraven/devel' into devel
PaulTalbot-INL May 18, 2018
fa9d2e3
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
PaulTalbot-INL May 22, 2018
050ba41
Merge branch 'devel' of github.com:PaulTalbot-INL/raven into devel
PaulTalbot-INL Jun 7, 2018
a2a1848
smarter raven locator for qsubbing
PaulTalbot-INL Jun 11, 2018
5534036
implementation done
PaulTalbot-INL Jun 12, 2018
0c89ccc
new test added
PaulTalbot-INL Jun 12, 2018
8887b8f
test included
PaulTalbot-INL Jun 12, 2018
9504f01
cleanup
PaulTalbot-INL Jun 12, 2018
6ce1346
another cleanup
PaulTalbot-INL Jun 12, 2018
1cd38d2
merged in devel
PaulTalbot-INL Jul 16, 2018
912a6d7
removed unused import "sys"
alfoa Jul 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions framework/CodeInterfaces/RAVEN/RAVENInterface.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,12 +95,12 @@ def _readMoreXML(self,xmlNode):
# check for existence
if not os.path.exists(source):
raise IOError(self.printTag+' ERROR: the conversionModule "{}" was not found!'
.format(self.extModForVarsManipulationPath))
.format(source))
# check module is imported
checkImport = utils.importFromPath(source)
if checkImport is None:
raise IOError(self.printTag+' ERROR: the conversionModule "{}" failed on import!'
.format(self.extModForVarsManipulationPath))
.format(source))
# check methods are in place
noScalar = 'convertNotScalarSampledVariables' in checkImport.__dict__
scalar = 'manipulateScalarSampledVariables' in checkImport.__dict__
Expand Down
27 changes: 19 additions & 8 deletions framework/CustomModes/MPISimulationMode.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def modifyInfo(self, runInfoDict):
nodefile = os.environ["PBS_NODEFILE"]
else:
nodefile = self.__nodefile
self.raiseADebug('Setting up remote nodes based on "{}"'.format(nodefile))
lines = open(nodefile,"r").readlines()
#XXX This is an undocumented way to pass information back
newRunInfo['Nodes'] = list(lines)
Expand All @@ -79,6 +80,7 @@ def modifyInfo(self, runInfoDict):
newRunInfo['batchSize'] = maxBatchsize
self.raiseAWarning("changing batchsize from "+str(oldBatchsize)+" to "+str(maxBatchsize)+" to fit on "+str(len(lines))+" processors")
newBatchsize = newRunInfo['batchSize']
self.raiseADebug('Batch size is "{}"'.format(newBatchsize))
if newBatchsize > 1:
#need to split node lines so that numMPI nodes are available per run
workingDir = runInfoDict['WorkingDir']
Expand Down Expand Up @@ -121,41 +123,50 @@ def __createAndRunQSUB(self, runInfoDict):
@ Out, remoteRunCommand, dict, dictionary of command.
"""
# Check if the simulation has been run in PBS mode and, in case, construct the proper command
#while true, this is not the number that we want to select
# determine the cores needed for the job
if self.__coresNeeded is not None:
coresNeeded = self.__coresNeeded
else:
coresNeeded = runInfoDict['batchSize']*runInfoDict['NumMPI']
# get the requested memory, if any
if self.__memNeeded is not None:
memString = ":mem="+self.__memNeeded
else:
memString = ""
#batchSize = runInfoDict['batchSize']
# raven/framework location
frameworkDir = runInfoDict["FrameworkDir"]
# number of "threads"
ncpus = runInfoDict['NumThreads']
# job title
jobName = runInfoDict['JobName'] if 'JobName' in runInfoDict.keys() else 'raven_qsub'
#check invalid characters
## fix up job title
validChars = set(string.ascii_letters).union(set(string.digits)).union(set('-_'))
if any(char not in validChars for char in jobName):
raise IOError('JobName can only contain alphanumeric and "_", "-" characters! Received'+jobName)
#check jobName for length
if len(jobName) > 15:
jobName = jobName[:10]+'-'+jobName[-4:]
print('JobName is limited to 15 characters; truncating to '+jobName)
#Generate the qsub command needed to run input
# Generate the qsub command needed to run input
## raven_framework location
raven = os.path.abspath(os.path.join(frameworkDir,'..','raven_framework'))
## generate the command, which will be passed into "args" of subprocess.call
command = ["qsub","-N",jobName]+\
runInfoDict["clusterParameters"]+\
["-l",
"select="+str(coresNeeded)+":ncpus="+str(ncpus)+":mpiprocs=1"+memString,
"select={}:ncpus={}:mpiprocs=1{}".format(coresNeeded,ncpus,memString),
"-l","walltime="+runInfoDict["expectedTime"],
"-l","place="+self.__place,"-v",
'COMMAND="../raven_framework '+
'COMMAND="{} '.format(raven)+
" ".join(runInfoDict["SimulationFiles"])+'"',
runInfoDict['RemoteRunCommand']]
#Change to frameworkDir so we find raven_qsub_command.sh
# Set parameters for the run command
remoteRunCommand = {}
remoteRunCommand["cwd"] = frameworkDir
## directory to start in, where the input file is
remoteRunCommand["cwd"] = runInfoDict['InputDir']
## command to run in that directory
remoteRunCommand["args"] = command
## print out for debugging
print("remoteRunCommand",remoteRunCommand)
return remoteRunCommand

Expand Down
7 changes: 7 additions & 0 deletions framework/Simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,7 @@ def initialize(self):
@ Out, None
"""
#move the full simulation environment in the working directory
self.raiseADebug('Moving to working directory:',self.runInfoDict['WorkingDir'])
os.chdir(self.runInfoDict['WorkingDir'])
#add also the new working dir to the path
sys.path.append(os.getcwd())
Expand Down Expand Up @@ -608,6 +609,9 @@ def __readRunInfo(self,xmlNode,runInfoSkip,xmlFilename):
else:
self.runInfoDict['printInput'] = text+'.xml'
elif element.tag == 'WorkingDir':
# first store the cwd, the "CallDir"
self.runInfoDict['CallDir'] = os.getcwd()
# then get the requested "WorkingDir"
tempName = element.text
if '~' in tempName:
tempName = os.path.expanduser(tempName)
Expand All @@ -618,8 +622,11 @@ def __readRunInfo(self,xmlNode,runInfoSkip,xmlFilename):
else:
if xmlFilename == None:
self.raiseAnError(IOError,'Relative working directory requested but xmlFilename is None.')
# store location of the input
xmlDirectory = os.path.dirname(os.path.abspath(xmlFilename))
self.runInfoDict['InputDir'] = xmlDirectory
rawRelativeWorkingDir = element.text.strip()
# working dir is file location + relative working dir
self.runInfoDict['WorkingDir'] = os.path.join(xmlDirectory,rawRelativeWorkingDir)
utils.makeDir(self.runInfoDict['WorkingDir'])
elif element.tag == 'maxQueueSize':
Expand Down
6 changes: 2 additions & 4 deletions framework/raven_qsub_command.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash

if test -n "$PBS_O_WORKDIR"; then
echo Moving to working dir: ${PBS_O_WORKDIR}
cd $PBS_O_WORKDIR
fi

Expand All @@ -10,11 +11,8 @@ module load MVAPICH2/2.0.1-GCC-4.9.2
## also the name of the raven libraries conda environment
source activate raven_libraries

echo `conda env list`
echo DEBUGG HERE IN RQC
conda list

which python
which mpiexec
echo ''
echo $COMMAND
$COMMAND
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<DataObjects>
<PointSet name="inputHolder">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>OutputPlaceHolder</Output>
</PointSet>
<PointSet name="Pointset_from_database_for_rom_trainer">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>CladTempThreshold,time</Output>
</PointSet>
<HistorySet name="Historyset_from_database_for_rom_trainer">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>CladTempThreshold</Output>
</HistorySet>
<PointSet name="data_for_sampling_empty_at_begin">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>OutputPlaceHolder</Output>
</PointSet>
<PointSet name="data_for_sampling_empty_at_begin_nd">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>OutputPlaceHolder</Output>
</PointSet>
<PointSet name="outputMontecarloRom">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<!--Output>CladTempThreshold</Output-->
</PointSet>
<HistorySet name="outputMontecarloRomHS">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>CladTempThreshold,time</Output>
</HistorySet>
<PointSet name="outputMontecarloRomND">
<Input>DeltaTimeScramToAux,DG1recoveryTime</Input>
<Output>CladTempThreshold</Output>
</PointSet>
</DataObjects>
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright 2017 Battelle Energy Alliance, LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



def manipulateScalarSampledVariables(sampledVars):
"""
This method is aimed to manipulate scalar variables.
The user can create new variables based on the
variables sampled by RAVEN
@ In, sampledVars, dict, dictionary of
sampled variables ({"var1":value1,"var2":value2})
@ Out, None, the new variables should be
added in the "sampledVariables" dictionary
"""
sampledVars['Models|ROM@subType:SciKitLearn@name:ROM1|coef0'] = sampledVars['Models|ROM@subType:SciKitLearn@name:ROM1|C']/10.0




Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
<?xml version="1.0" ?>
<Simulation verbosity="debug">

<RunInfo>
<WorkingDir>test_rom_trainer</WorkingDir>
<Sequence>MC_for_rom_trainer, test_extract_for_rom_trainer,
test_rom_trainer, test_rom_trainerHS,
rom_MC, rom_MCHS,
test_rom_trainer_nd_interp,rom_MC_nd_interpolator</Sequence>
<batchSize>1</batchSize>
</RunInfo>

<Distributions>
<Uniform name="auxbackup">
<lowerBound>0</lowerBound>
<upperBound>2000</upperBound>
</Uniform>
<Uniform name="DG1backup">
<lowerBound>0</lowerBound>
<upperBound>1000</upperBound>
</Uniform>
</Distributions>

<Models>
<ExternalModel ModuleToLoad="TMI_fake" name="PythonModule" subType="">
<variables>DeltaTimeScramToAux,DG1recoveryTime,time,CladTempThreshold</variables>
</ExternalModel>
<ROM name="ROM1" subType="SciKitLearn">
<Features>DeltaTimeScramToAux,DG1recoveryTime</Features>
<Target>CladTempThreshold</Target>
<SKLtype>svm|SVR</SKLtype>
<kernel>linear</kernel>
<C>10.0</C>
<tol>0.0001</tol>
<coef0>0.0</coef0>
</ROM>
<ROM name="ROMHS" subType="SciKitLearn">
<Features>DeltaTimeScramToAux,DG1recoveryTime</Features>
<Target>CladTempThreshold,time</Target>
<SKLtype>svm|SVR</SKLtype>
<kernel>linear</kernel>
<C>10.0</C>
<tol>0.0001</tol>
<coef0>0.0</coef0>
</ROM>
<ROM name="ROM2" subType="NDinvDistWeight">
<Features>DeltaTimeScramToAux,DG1recoveryTime</Features>
<Target>CladTempThreshold</Target>
<p>3</p>
</ROM>
</Models>

<Samplers>
<MonteCarlo name="RAVENmc3">
<samplerInit>
<limit>3</limit>
</samplerInit>
<variable name="DeltaTimeScramToAux">
<distribution>auxbackup</distribution>
</variable>
<constant name="DG1recoveryTime">500</constant>
</MonteCarlo>
<Grid name="gridRom">
<variable name="DeltaTimeScramToAux">
<distribution>auxbackup</distribution>
<grid construction="custom" type="CDF">0.5 1.0 0.0</grid>
</variable>
<constant name="DG1recoveryTime">500</constant>
</Grid>
<MonteCarlo name="RAVENmcCode3">
<samplerInit>
<limit>3</limit>
</samplerInit>
<variable name="DeltaTimeScramToAux">
<distribution>auxbackup</distribution>
</variable>
<variable name="DG1recoveryTime">
<distribution>DG1backup</distribution>
</variable>
</MonteCarlo>
<MonteCarlo name="RAVENmcND">
<samplerInit>
<limit>3</limit>
<initialSeed>200286</initialSeed>
</samplerInit>
<variable name="DeltaTimeScramToAux">
<distribution>auxbackup</distribution>
</variable>
<variable name="DG1recoveryTime">
<distribution>DG1backup</distribution>
</variable>
</MonteCarlo>
</Samplers>

<Steps verbosity="debug">
<MultiRun name="MC_for_rom_trainer" verbosity="debug">
<Input class="DataObjects" type="PointSet">inputHolder</Input>
<Model class="Models" type="ExternalModel">PythonModule</Model>
<Sampler class="Samplers" type="MonteCarlo">RAVENmcCode3</Sampler>
<Output class="Databases" type="HDF5">MC_TEST_EXTRACT_STEP_FOR_ROM_TRAINER</Output>
</MultiRun>
<IOStep name="test_extract_for_rom_trainer" verbosity="debug">
<Input class="Databases" type="HDF5">MC_TEST_EXTRACT_STEP_FOR_ROM_TRAINER</Input>
<Input class="Databases" type="HDF5">MC_TEST_EXTRACT_STEP_FOR_ROM_TRAINER</Input>
<Output class="DataObjects" type="PointSet">Pointset_from_database_for_rom_trainer</Output>
<Output class="DataObjects" type="HistorySet">Historyset_from_database_for_rom_trainer</Output>
<Output class="OutStreams" type="Print">ciccio</Output>
<Output class="OutStreams" type="Print">ciccioHS</Output>
</IOStep>
<RomTrainer name="test_rom_trainer" verbosity="debug">
<Input class="DataObjects" type="PointSet">Pointset_from_database_for_rom_trainer</Input>
<Output class="Models" type="ROM">ROM1</Output>
</RomTrainer>
<RomTrainer name="test_rom_trainerHS" verbosity="debug">
<Input class="DataObjects" type="HistorySet">Historyset_from_database_for_rom_trainer</Input>
<Output class="Models" type="ROM">ROMHS</Output>
</RomTrainer>
<RomTrainer name="test_rom_trainer_nd_interp">
<Input class="DataObjects" type="PointSet">Pointset_from_database_for_rom_trainer</Input>
<Output class="Models" type="ROM">ROM2</Output>
</RomTrainer>
<MultiRun name="rom_MC" re-seeding="200286">
<Input class="DataObjects" type="PointSet">data_for_sampling_empty_at_begin</Input>
<Model class="Models" type="ROM">ROM1</Model>
<Sampler class="Samplers" type="Grid">gridRom</Sampler>
<Output class="DataObjects" type="PointSet">outputMontecarloRom</Output>
<Output class="OutStreams" type="Print">outputMontecarloRom_dump</Output>
</MultiRun>
<MultiRun name="rom_MCHS" re-seeding="200286">
<Input class="DataObjects" type="PointSet">data_for_sampling_empty_at_begin</Input>
<Model class="Models" type="ROM">ROMHS</Model>
<Sampler class="Samplers" type="Grid">gridRom</Sampler>
<Output class="DataObjects" type="HistorySet">outputMontecarloRomHS</Output>
<Output class="OutStreams" type="Print">outputMontecarloRomHS_dump</Output>
</MultiRun>
<MultiRun name="rom_MC_nd_interpolator">
<Input class="DataObjects" type="PointSet">data_for_sampling_empty_at_begin_nd</Input>
<Model class="Models" type="ROM">ROM2</Model>
<Sampler class="Samplers" type="MonteCarlo">RAVENmcND</Sampler>
<Output class="DataObjects" type="PointSet">outputMontecarloRomND</Output>
<Output class="OutStreams" type="Print">outputMontecarloRomND_dump</Output>
</MultiRun>
</Steps>

<OutStreams>
<Print name="outputMontecarloRom_dump">
<type>csv</type>
<source>outputMontecarloRom</source>
</Print>
<Print name="outputMontecarloRomHS_dump">
<type>csv</type>
<source>outputMontecarloRomHS</source>
</Print>
<Print name="outputMontecarloRomND_dump">
<type>csv</type>
<source>outputMontecarloRomND</source>
</Print>
<Print name="ciccio">
<type>csv</type>
<source>Pointset_from_database_for_rom_trainer</source>
</Print>
<Print name="ciccioHS">
<type>csv</type>
<source>Historyset_from_database_for_rom_trainer</source>
</Print>
</OutStreams>

<Databases>
<HDF5 name="MC_TEST_EXTRACT_STEP_FOR_ROM_TRAINER" readMode="overwrite"/>
</Databases>

<ExternalXML node="DataObjects" xmlToLoad="../ext_dataobjects.xml"/>

</Simulation>
Loading