-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load and dump TS YAML files in Arkane #1551
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1551 +/- ##
==========================================
- Coverage 41.86% 41.84% -0.02%
==========================================
Files 165 165
Lines 28008 28008
Branches 5713 5713
==========================================
- Hits 11726 11721 -5
- Misses 15494 15498 +4
- Partials 788 789 +1
Continue to review full report at Codecov.
|
4c76ce5
to
a815028
Compare
@mliu49, I think this is now ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial round of comments. I will test it out soon.
@@ -188,11 +188,8 @@ def save_yaml(self, path): | |||
filename = os.path.join('ArkaneSpecies', | |||
''.join(c for c in self.label if c in valid_chars) + '.yml') | |||
full_path = os.path.join(path, filename) | |||
content = yaml.dump(data=self.as_dict(), Dumper=Dumper) | |||
# remove empty lines from the file (multi-line strings have excess new line brakes for some reason): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I was worried about this. I probably should have followed up on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looks ugly, but works....
Another option for human input is to use a pipe symbol in yaml, e.g. (in ARC's format):
species:
- label: vinoxy
smiles: C=C[O]
multiplicity: 2
charge: 0
- label: OH
xyz: |
O 0.00000000 0.00000000 -0.12002167
H 0.00000000 0.00000000 0.85098324
- label: methylamine
adjlist: |
1 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
2 N u0 p1 c0 {1,S} {6,S} {7,S}
3 H u0 p0 c0 {1,S}
4 H u0 p0 c0 {1,S}
5 H u0 p0 c0 {1,S}
6 H u0 p0 c0 {2,S}
7 H u0 p0 c0 {2,S}
- label: propene
smiles: C=CC
- label: N2H4
smiles: NN
if 'Final energy is' in line: | ||
E0 = float(line.split()[3]) * constants.E_h * constants.Na | ||
logging.debug('energy is {}'.format(str(E0))) | ||
if E0 is None: | ||
for line in f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, the reason it fails is because you looping through a file moves a pointer from the beginning to the end. Once you're at the end, looping through again doesn't do anything. Thus, you need to reset the pointer, which can be done using f.seek(0)
.
However, I think you could probably accomplish this only reading through the file once:
with open (self.path, 'r') as f:
a = b = 0
for line in f:
if 'Final energy is' in line:
a = float(line.split()[3]) * constants.E_h * constants.Na
if 'Total energy in the final basis set' in line:
b = float(line.split()[8]) * constants.E_h * constants.Na
E0 = a or b
logging.debug('energy is {}'.format(str(E0)))
Of course the fix you have solves the issue, so it's up to you whether or not to try these other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your implementation. I'll modify it a bit into
e0 = a if a else b
since I think that if the text for b is in the file, then the text for a is there as well. I'll check and will anyway add a test.
Edit: I just realized that e0 = a or b
does exactly that. I'll keep it
_, file_extension = os.path.splitext(path) | ||
if file_extension in ['.yml', '.yaml']: | ||
if TS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this line deleted intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR replaces it with is_ts
, but probably this commit only captured the deletion. I'll sort the commits out
logging.debug("made object {0}".format(class_name)) | ||
data[key].append(obj) | ||
elif isinstance(val, list) and val: | ||
if isinstance(val[0], dict) and 'class' in val[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section could be combined with the first if clause by adding an elif at the second level. Let me know if that made sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so clear. Combine if isinstance(val[0], dict) and 'class' in val[0]
with if isinstance(val, dict) and 'class' in val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you can ignore this. I missed the lines that were hidden by GitHub and thought that these changes were part of the same function.
chemkin_thermo_string: "N2H3 H 3N 2 G 10.000 3000.000 | ||
\ 478.04 1\n 2.42935350E+00 1.15088524E-02-6.01470204E-06 1.59974609E-09-1.70089224E-13 | ||
\ 2\n 2.62991058E+04 1.15861374E+01 4.04880122E+00-4.38083729E-03 5.11833994E-05 | ||
\ 3\n-8.84029667E-08 5.22512137E-11 2.61709988E+04 5.24801815E+00 4\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea why this string is printed differently from the adjlist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because the lines are too long, so YAML breaks them, then makes it a multiline string with "..."
. Perhaps \n
s aren't executed then?
@mliu49 , I addressed the initial comments, squashed and force-pushed (I re-organized the commits) |
I tested TS dumping and loading, and it seems to work fine. I added a new commit with a couple changes to yaml dumping. Most notably, I added a custom string representer to make it dump multiline strings as block literals. Please try it out and let me know what you think. I also removed the |
Doing so interfers with reading the line brakes, so whe reading we get: 1 O u0 p2 c0 {2,S} {9,S} 2 C u0 p0 c0 {1,S} {3,S} {4,S} {5,S} 3 C u0 p0 c0 {2,S} {6,S} {7,S} {8,S} 4 H u0 p0 c0 {2,S} 5 H u0 p0 c0 {2,S} 6 H u0 p0 c0 {3,S} 7 H u0 p0 c0 {3,S} 8 H u0 p0 c0 {3,S} 9 H u0 p0 c0 {1,S} instead of: 1 O u0 p2 c0 {2,S} {9,S} 2 C u0 p0 c0 {1,S} {3,S} {4,S} {5,S} 3 C u0 p0 c0 {2,S} {6,S} {7,S} {8,S} 4 H u0 p0 c0 {2,S} 5 H u0 p0 c0 {2,S} 6 H u0 p0 c0 {3,S} 7 H u0 p0 c0 {3,S} 8 H u0 p0 c0 {3,S} 9 H u0 p0 c0 {1,S}
For some reason ARC fails without this modification when sp calculations are done in QChem. It is indeed less efficient, since the entire file is loaded to memory.
Using the 'Total energy in the final basis set' phrase
Some properties are important to parse, but are not objects and cannot be updated automatically. `imaginary_frequency` contains a ScalarQuantity instance, but it is not an object in itself in RMG/Arkane, just an attribute of TransitionState (.frequency). It's important to parse it for tunneling. Structure information is important to parse since YAML file generation depends on it.
Iff structures of all reactant/s and product/s are given
Also, renames `TS` as `is_ts`
Thanks for this awesome addition, @mliu49! Now it also looks nice :) |
Would you like to update the example yaml files following the change to string representation? Otherwise I think this is ready to merge. |
Showing how to use species (and TS) YAML files to run Arkane
Provide file stream directly to yaml.dump Remove optional Dumper argument and associated imports Add str representer to dump multiline strings as block literals
Done! |
Motivation
For a more convenient workflow of archiving QM data and easily parsing it again (and soon automating this process and forming a database) it is important to also save and load YAML files of TSs in addition to stable species.
Description of Changes
ArkaneSpecies can now process TSs as well.
So far, a YAML file was generated for stable species iff their structure was given and Thermo() was called (both important data to save)
Now, a TS YAML file is saved iff structures of all relevant reactant/s and product/s were given and Kinetics() was called. We're not saving the kinetics, though.
The RMGObject was changes slightly to save and load lists.
This PR has one small commit not directly related to the topic, changing how Arkane loads energy from QChem files. I could open a different PR if the reviewers prefer.
Testing
An example for a reaction where all species and TS are loaded from YAML files was added (Arkane examples are also used as functional tests).
Reviewer Tips
Run any reaction in Arkane, look at the YAML files generated, then run the reaction again, pointing the TS to the generated YAML file.