Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace custom output file parsing with cclib #43

Merged
merged 6 commits into from
Jan 23, 2022

Conversation

berquist
Copy link
Contributor

This is a proposal to replace much (but not all) of the custom output parsing with cclib, so that GoodVibes can be used with any computational chemistry program that cclib supports, and not just Gaussian, ORCA, and NWChem.

It can't be a complete replacement because:

  • cclib doesn't yet parse any information about solvation methods (I think you just want the name?),
  • cclib doesn't yet parse timing information (though it is currently in progress),
  • cclib doesn't detect all job types (specifically TS), and
  • basis set and method information from cclib is spotty.

cclib also can't read COSMO-RS files, but since that's for a completely separate quantity, I won't consider in this PR.

cclib parses version information well, but I haven't investigated yet what GoodVibes is parsing as versions, hence the current draft status.

If you're ok with an extra dependency, let me know your thoughts. There may also be a version restriction, since cclib now has a minimum required Python version of 3.5 (2.7 may still work, but is no longer tested, and 2.6 definitely doesn't work).

@bobbypaton
Copy link
Member

Thanks for this - integration with cclib is definitely something we've been mulling over for a while!

Any ideas why there are several builds failing on travis while others are OK - is this related to the minimum Python version requirement of cclib?

@berquist
Copy link
Contributor Author

berquist commented May 25, 2021 via email

@berquist
Copy link
Contributor Author

I took a look at CI and it wasn't anything to do with 3.5 being unsupported specifically. It was dependencies not being installed in the right order, and too-new versions of SciPy and NumPy were trying to be installed.

@berquist
Copy link
Contributor Author

There is more code that could replaced in thermo.calc_bbe, but I think it's better to not do everything at once.

@berquist
Copy link
Contributor Author

Something disappointing I just discovered is that running time python -m pytest -v before and after this PR is not an improvement...

parsing time
existing 4.04s user 2.01s system 163% cpu 3.707 total
with cclib 25.18s user 3.98s system 118% cpu 24.505 total

It may take a while to find where the extra time is spent in cclib, but even then, by its very nature of parsing more things than GoodVibes needs, it will always take longer, and we can't predict how much this difference will shrink.

@jvalegre
Copy link
Collaborator

jvalegre commented Jan 21, 2022

Hi @berquist - I just noticed that you replaced the lines regarding SCF and MP energies but left the line for double hybrid functionals in your "Program-agnostic charge, multiplicity, and DFT/MP/CC energy" commit (see code below). This is a bug we just fixed (check the closed issue #51 ), is this also fixed in cclib? I saw the issue open from 2019 but I'm not sure what was your final verdict (i.e. do the energies from cclib correspond to the final energy that GoodVibes needs to use or the initial SCF only?).

if line.strip().startswith('E2('):
                spe_value = line.strip().split()[-1]

@jvalegre
Copy link
Collaborator

Hi @berquist - I talked to Rob about this merge, we're gonna merge and call this branch "GVcclib". We also want to create json files to store the information obtained from cclib, that way the program will be able to skip the parsing from cclib and read json files with the info instead when the same file is used in multiple runs. I'm doing something similar in the QCORR module from AQME (https://github.com/jvalegre/aqme/blob/master/aqme/qcorr.py, line 219), but I use ccwrite to write and read a json file instead of creating the internal python cclib object.

Could you give us some insight about what's the best way to do this?
Thanks for the help!!

@jvalegre jvalegre merged commit 3bc3b74 into patonlab:master Jan 23, 2022
@berquist
Copy link
Contributor Author

I just noticed that you replaced the lines regarding SCF and MP energies but left the line for double hybrid functionals in your "Program-agnostic charge, multiplicity, and DFT/MP/CC energy" commit (see code below). This is a bug we just fixed (check the closed issue #51 ), is this also fixed in cclib? I saw the issue open from 2019 but I'm not sure what was your final verdict (i.e. do the energies from cclib correspond to the final energy that GoodVibes needs to use or the initial SCF only?).

I left that line in because I'm not sure it's covered properly by cclib. As you're aware, it all stems from the fact that cclib doesn't have a "final energy" but instead opts to separate out the SCF from different post HF contributions. I (probably all of us) had forgotten about cclib/cclib#746, and we didn't come to a conclusion. The answer for now is "probably not", but we need to check the mpenergies attribute first.

we're gonna merge and call this branch "GVcclib"

Just to make sure, are you aware that this was merged to the master branch?

that way the program will be able to skip the parsing from cclib and read json files with the info instead when the same file is used in multiple runs.

Yes, this is a good idea. (You might want to add some functionality that checks if the output file has changed and reparsing if necessary, like storing MD5 hash, otherwise it's easy to get bitten by having the stale JSON stick around.)

I use ccwrite to write and read a json file instead of creating the internal python cclib object

This seems reasonable to me as long as working with the JSON representation is easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants