Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve submit files #81

Merged
merged 2 commits into from
Mar 7, 2019
Merged

Improve submit files #81

merged 2 commits into from
Mar 7, 2019

Conversation

alongd
Copy link
Member

@alongd alongd commented Mar 6, 2019

Use dict of dicts to store submit templates in server/software hierarchy
Added a maximal job time attribute to ARC

@alongd alongd changed the title Imprve submit files Improve submit files Mar 6, 2019
@alongd alongd force-pushed the submit branch 3 times, most recently from 83eb5fa to 5cc109a Compare March 6, 2019 14:44
@codecov
Copy link

codecov bot commented Mar 6, 2019

Codecov Report

Merging #81 into master will decrease coverage by 34.89%.
The diff coverage is 6.66%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #81      +/-   ##
========================================
- Coverage    41.1%   6.2%   -34.9%     
========================================
  Files          22     22              
  Lines        4885   4898      +13     
  Branches     1263   1266       +3     
========================================
- Hits         2008    304    -1704     
- Misses       2556   4590    +2034     
+ Partials      321      4     -317
Impacted Files Coverage Δ
arc/job/submit.py 0% <ø> (-100%) ⬇️
arc/main.py 2.75% <0%> (-40.26%) ⬇️
arc/scheduler.py 2.05% <0%> (-16.04%) ⬇️
arc/job/job.py 1.19% <9.09%> (-20.76%) ⬇️
arc/job/inputs.py 0% <0%> (-100%) ⬇️
arc/__init__.py 20% <0%> (-80%) ⬇️
arc/job/__init__.py 25% <0%> (-75%) ⬇️
arc/species/converter.py 15.92% <0%> (-64.97%) ⬇️
arc/rmgdb.py 10.27% <0%> (-63.7%) ⬇️
arc/parser.py 13.58% <0%> (-62.97%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c00b1d...5cc109a. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 6, 2019

Codecov Report

Merging #81 into master will increase coverage by 0.01%.
The diff coverage is 50%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #81      +/-   ##
========================================
+ Coverage   40.99%    41%   +0.01%     
========================================
  Files          22     22              
  Lines        4923   4941      +18     
  Branches     1274   1277       +3     
========================================
+ Hits         2018   2026       +8     
- Misses       2578   2586       +8     
- Partials      327    329       +2
Impacted Files Coverage Δ
arc/job/submit.py 100% <ø> (ø) ⬆️
arc/settings.py 100% <100%> (ø) ⬆️
arc/main.py 43.27% <100%> (+0.26%) ⬆️
arc/job/job.py 21.49% <18.18%> (-0.19%) ⬇️
arc/scheduler.py 18.68% <71.42%> (+0.2%) ⬆️
arc/reaction.py 42.35% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51c1aca...052e3e2. Read the comment docs.

@alongd alongd force-pushed the submit branch 2 times, most recently from f88fa04 to 4c756e9 Compare March 6, 2019 15:54
@alongd
Copy link
Member Author

alongd commented Mar 6, 2019

@cgrambow and @dranasinghe, can you take a look at these updated submit scripts?

@alongd
Copy link
Member Author

alongd commented Mar 6, 2019

Here's a clean version:

submit_scripts = {
    'Slurm': {
        # Gaussian09 on C3DDB
        'gaussian': """#!/bin/bash -l
#SBATCH -p defq
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu 4500

module add c3ddb/gaussian/09.d01
which g09

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"

WorkDir=/scratch/users/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
SubmitDir=`pwd`

GAUSS_SCRDIR=/scratch/users/{un}/g09/$SLURM_JOB_NAME-$SLURM_JOB_ID
export  GAUSS_SCRDIR

mkdir -p $GAUSS_SCRDIR
mkdir -p $WorkDir

cd  $WorkDir
. $g09root/g09/bsd/g09.profile

cp $SubmitDir/input.gjf .

g09 < input.gjf > input.log
formchk  check.chk check.fchk
cp * $SubmitDir/

rm -rf $GAUSS_SCRDIR
rm -rf $WorkDir

""",

        # Orca on C3DDB:
        'orca': """#!/bin/bash -l
#SBATCH -p defq
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -n 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu 4500

module add c3ddb/orca/4.0.0
module add c3ddb/openmpi/2.0.2
which orca

export ORCA_DIR=/cm/shared/c3ddb/orca/4.0.0/
export PATH=$PATH:$ORCA_DIR

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"


WorkDir=/scratch/users/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
SubmitDir=`pwd`

mkdir -p $WorkDir
cd  $WorkDir

cp $SubmitDir/input.inp .

${ORCA_DIR}/orca input.inp > input.log
cp * $SubmitDir/

rm -rf $WorkDir

""",

        # Molpro 2015 on RMG
        'molpro': """#!/bin/bash -l
#SBATCH -p normal
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -c 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu=2048

export PATH=/opt/molpro/molprop_2015_1_linux_x86_64_i8/bin:$PATH

echo "============================================================"
echo "Job ID : $SLURM_JOB_ID"
echo "Job Name : $SLURM_JOB_NAME"
echo "Starting on : $(date)"
echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"

# WorkDir=`pwd`
# cd
# source .bashrc
sdir=/scratch/{un}/$SLURM_JOB_NAME-$SLURM_JOB_ID
mkdir -p $sdir
# export TMPDIR=$sdir
# cd $WorkDir

molpro -d $sdir input.in

rm -rf $sdir

""",
    },


    'OGE': {
        # Gaussian16 on Pharos
        'gaussian': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

g16root=/opt
GAUSS_SCRDIR=/scratch/{un}/{name}
export g16root GAUSS_SCRDIR
. $g16root/g16/bsd/g16.profile
mkdir -p /scratch/{un}/{name}

g16 input.gjf

rm -r /scratch/{un}/{name}

""",
        # Gaussian03 on Pharos
        'gaussian03_pharos': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

g03root=/opt
GAUSS_SCRDIR=/scratch/{un}/{name}
export g03root GAUSS_SCRDIR
. $g03root/g03/bsd/g03.profile
mkdir -p /scratch/{un}/{name}

g16 input.gjf

rm -r /scratch/{un}/{name}

""",
        # QChem 4.4 on Pharos:
        'qchem': """#!/bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

echo "Running on node:"
hostname

export QC=/opt/qchem
export QCSCRATCH=/scratch/{un}/{name}
export QCLOCALSCR=/scratch/{un}/{name}/qlscratch
. $QC/qcenv.sh

mkdir -p /scratch/{un}/{name}/qlscratch

qchem -nt 6 input.in output.out

rm -r /scratch/{un}/{name}

""",
        # Molpro 2012 on Pharos
        'molpro': """#! /bin/bash -l

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
#$ -l h=!node60.cluster
#$ -cwd
#$ -o out.txt
#$ -e err.txt

export PATH=/opt/molpro2012/molprop_2012_1_Linux_x86_64_i8/bin:$PATH

sdir=/scratch/{un}
mkdir -p /scratch/{un}/qlscratch

molpro -d $sdir -n 6 input.in
""",
    }
}

Copy link

@cgrambow cgrambow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some suggestions for the non-C3DDB servers.

More fundamentally though, what is your plan going forward? If ARC gets to the stage where people outside of the group start using it then these scripts are not going to work for them and it's not feasible to maintain scripts for all eventualities. One way to tackle that would be to have minimal script templates that users can modify themselves and require that the users themselves make sure that Gaussian, Molpro, etc. are available at the command line. Just wanted to know what your thoughts are or if it's too early to start thinking about these questions.

echo "Current directory : $(pwd)"
echo "============================================================"

# WorkDir=`pwd`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably remove these comments. Same goes for all the following lines.

#SBATCH -p normal
#SBATCH -J {name}
#SBATCH -N 1
#SBATCH -c 8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're hardcoding the number of CPUs requested per task here to be requested by SLURM, but then you don't actually end up running Molpro in parallel. Also, this option should maybe be set by the user instead of defaulting to 8. I think a default of 1 makes the most sense.

Also, I think Molpro might launch different tasks when running in parallel, so you might need the -n option instead of -c.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should be #SBATCH -n 1?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'd go with -n 1 (or -n 8 if you want to parallelize eight ways, for example).

#SBATCH -N 1
#SBATCH -c 8
#SBATCH --time={t_max}
#SBATCH --mem-per-cpu=2048
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will depend a lot on the size of the molecule. Just curious if you're planning to make this adjustable later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm passing the overall memory (MW) to the molpro input file. If we're using just one CPU, I could/should make this parameter the same. I guess this it's in MB here, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should be the same amount in MB (maybe slightly larger to have a small safety factor) as you're allowing Molpro to use in MW.

# export TMPDIR=$sdir
# cd $WorkDir

molpro -d $sdir input.in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you're not running Molpro in parallel. If you wanted to do that you have to add the -n option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's run in parallel then. So is #SBATCH -n 8 enough, or should I give a directive in this line as well?

Copy link

@cgrambow cgrambow Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run Molpro in parallel you will need both #SBATCH -n 8 and molpro -n 8 -d ....


# Molpro 2015 on RMG
'molpro': """#!/bin/bash -l
#SBATCH -p normal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider defaulting to long?

#$ -N {name}
#$ -l long
#$ -l h_rt={t_max}
#$ -l harpertown
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G16 also runs on the magnycours nodes, so this isn't needed anymore.

#$ -l h_rt={t_max}
#$ -l harpertown
#$ -m ae
#$ -pe singlenode 6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 6?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pharos fails much more often when I request 8

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also very interesting :p

rm -r /scratch/{un}/{name}

""",
# Gaussian03 on Pharos
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.


g16 input.gjf

rm -r /scratch/{un}/{name}

""",
'qchem': """#!/bin/bash

# QChem 4.4 on Pharos:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments again.

echo "Running on node : $SLURMD_NODENAME"
echo "Current directory : $(pwd)"
echo "============================================================"
# Molpro 2012 on Pharos
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well.

@alongd
Copy link
Member Author

alongd commented Mar 7, 2019

Thanks @cgrambow! I added some questions. Particularly, could you help me correctly run molpro in parallel?

alongd added 2 commits March 7, 2019 15:00
Added a maximum job time argument to ARC

and passing it to the submit scripts
@alongd
Copy link
Member Author

alongd commented Mar 7, 2019

Thanks @cgrambow! Could you take a final look whether Molpro on the RMG server is now parallelized correctly on 8 cpu's?

Copy link

@cgrambow cgrambow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a quick test job on RMG to test that it works and everything seems to run fine with the correct number of processors.

@alongd
Copy link
Member Author

alongd commented Mar 7, 2019

Thanks!

@alongd alongd merged commit d6a8e12 into master Mar 7, 2019
@alongd alongd deleted the submit branch March 7, 2019 20:42
@alongd
Copy link
Member Author

alongd commented Mar 8, 2019

@cgrambow, I addressed the technicalities, but not your broader question.
My current view is that all users should make sure the submit.py fits their needs (manually). I think we say it in the documentation. Merging this PR makes it extremely convenient for members of our group, and gives a nice example for other users. Perhaps we could think of a clever way to tailor these scripts to an arbitrary server (or at least try).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants