-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve submit files #81
Conversation
83eb5fa
to
5cc109a
Compare
Codecov Report
@@ Coverage Diff @@
## master #81 +/- ##
========================================
- Coverage 41.1% 6.2% -34.9%
========================================
Files 22 22
Lines 4885 4898 +13
Branches 1263 1266 +3
========================================
- Hits 2008 304 -1704
- Misses 2556 4590 +2034
+ Partials 321 4 -317
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #81 +/- ##
========================================
+ Coverage 40.99% 41% +0.01%
========================================
Files 22 22
Lines 4923 4941 +18
Branches 1274 1277 +3
========================================
+ Hits 2018 2026 +8
- Misses 2578 2586 +8
- Partials 327 329 +2
Continue to review full report at Codecov.
|
f88fa04
to
4c756e9
Compare
@cgrambow and @dranasinghe, can you take a look at these updated submit scripts? |
Here's a clean version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some suggestions for the non-C3DDB servers.
More fundamentally though, what is your plan going forward? If ARC gets to the stage where people outside of the group start using it then these scripts are not going to work for them and it's not feasible to maintain scripts for all eventualities. One way to tackle that would be to have minimal script templates that users can modify themselves and require that the users themselves make sure that Gaussian, Molpro, etc. are available at the command line. Just wanted to know what your thoughts are or if it's too early to start thinking about these questions.
arc/job/submit.py
Outdated
echo "Current directory : $(pwd)" | ||
echo "============================================================" | ||
|
||
# WorkDir=`pwd` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably remove these comments. Same goes for all the following lines.
arc/job/submit.py
Outdated
#SBATCH -p normal | ||
#SBATCH -J {name} | ||
#SBATCH -N 1 | ||
#SBATCH -c 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're hardcoding the number of CPUs requested per task here to be requested by SLURM, but then you don't actually end up running Molpro in parallel. Also, this option should maybe be set by the user instead of defaulting to 8. I think a default of 1 makes the most sense.
Also, I think Molpro might launch different tasks when running in parallel, so you might need the -n
option instead of -c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should be #SBATCH -n 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I'd go with -n 1
(or -n 8
if you want to parallelize eight ways, for example).
arc/job/submit.py
Outdated
#SBATCH -N 1 | ||
#SBATCH -c 8 | ||
#SBATCH --time={t_max} | ||
#SBATCH --mem-per-cpu=2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will depend a lot on the size of the molecule. Just curious if you're planning to make this adjustable later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm passing the overall memory (MW) to the molpro input file. If we're using just one CPU, I could/should make this parameter the same. I guess this it's in MB here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should be the same amount in MB (maybe slightly larger to have a small safety factor) as you're allowing Molpro to use in MW.
arc/job/submit.py
Outdated
# export TMPDIR=$sdir | ||
# cd $WorkDir | ||
|
||
molpro -d $sdir input.in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you're not running Molpro in parallel. If you wanted to do that you have to add the -n
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run in parallel then. So is #SBATCH -n 8
enough, or should I give a directive in this line as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run Molpro in parallel you will need both #SBATCH -n 8
and molpro -n 8 -d ...
.
arc/job/submit.py
Outdated
|
||
# Molpro 2015 on RMG | ||
'molpro': """#!/bin/bash -l | ||
#SBATCH -p normal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider defaulting to long
?
#$ -N {name} | ||
#$ -l long | ||
#$ -l h_rt={t_max} | ||
#$ -l harpertown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G16 also runs on the magnycours
nodes, so this isn't needed anymore.
#$ -l h_rt={t_max} | ||
#$ -l harpertown | ||
#$ -m ae | ||
#$ -pe singlenode 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pharos fails much more often when I request 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also very interesting :p
rm -r /scratch/{un}/{name} | ||
|
||
""", | ||
# Gaussian03 on Pharos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above.
|
||
g16 input.gjf | ||
|
||
rm -r /scratch/{un}/{name} | ||
|
||
""", | ||
'qchem': """#!/bin/bash | ||
|
||
# QChem 4.4 on Pharos: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments again.
echo "Running on node : $SLURMD_NODENAME" | ||
echo "Current directory : $(pwd)" | ||
echo "============================================================" | ||
# Molpro 2012 on Pharos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here as well.
Thanks @cgrambow! I added some questions. Particularly, could you help me correctly run molpro in parallel? |
Added a maximum job time argument to ARC and passing it to the submit scripts
Thanks @cgrambow! Could you take a final look whether Molpro on the RMG server is now parallelized correctly on 8 cpu's? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a quick test job on RMG to test that it works and everything seems to run fine with the correct number of processors.
Thanks! |
@cgrambow, I addressed the technicalities, but not your broader question. |
Use dict of dicts to store submit templates in server/software hierarchy
Added a maximal job time attribute to ARC