-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize toolkit I/O #1006
Standardize toolkit I/O #1006
Conversation
…ise ValueError if not found.
…e SmilesMolSupplier instead of writing a new SMILES parser
…name) doesn't exist, raise ValueError
…se the same code as from_file()
…se the same SMILES as to_smiles(); don't include a header line in RDKit
…instead of text mode
…stry testing works in from_smiles()
Still contains extra data records which aren't used, doesn't check if RDKit/OEChem aren't installed, and need tests for loading two SDF/SMILES molecules, and for loading a file with an error in it,. |
…tead of "fileobj_manager" for consistency
…odd exception with the None molecule
…ause the format is specified; no need to guess from the filename.
…Raise error if it fails.
# Switch to a ValueError and use a more informative exception | ||
# message to match RDKit's toolkit writer. | ||
raise ValueError( | ||
"Need a text mode file object like StringIO or a file opened with mode 't'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be mode 'r'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"r" is for "read". "rt" is to read in text mode. "rb" is to read in binary mode. If unspecified, the default is "t", so you rarely see it written explicitly. https://docs.python.org/3/library/functions.html?highlight=text+mode#open
# Support the older API, whihc required Unicode strings | ||
tmpfile.write(content.encode("utf8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nifty! It's cool that you know this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, that's OpenFF's "older" API. I know this because when I passed in a ByteIO() it failed. There's also a test for this backwards compatibility support:
def test_from_file_obj_smi_supports_stringio(self):
# Backwards compability. Older versions of OpenFF supported
# string-based file objects, but not file-based ones.
with open(file_manager.caffeine_smi) as file_obj:
mol = self.toolkit_wrapper.from_file_obj(file_obj, "SMI")[0]
assert mol.name == "CHEMBL113"
Grr. I just realized that "file-based" should be "binary-based". Based on your comment, I see that code comment for the test should instead be in the rdkit_wrapper
code you highlighted. Just committed and pushed.
if not oechem.OESmilesToMol(oemol, smiles): | ||
raise ParseError("Unable to parse the SMILES string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I've seen that OE uses this pattern a lot in their example code, so I'm glad that you know where to add it in here.
docs/releasehistory.md
Outdated
- Writing to a SMILES file now uses the same SMILES as | ||
`to_smiles()`, and (for RDKit) does not include a header line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) This header line thing has bugged me for a long time! Thanks for catching it. I know of a few people who have worked around this, and so I think they could get burned if they have an application that's been discarding the first line by default. So let's list this one as a behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two bullets to the "New features and behaviors changed", one for the change from implicit to explicit [H] in both toolkits and the other about the removal of the title line in the RDKit wrapper.
@pytest.mark.parametrize( | ||
"name,file_format", [("caffeine_2d_sdf", "SDF"), ("caffeine_smi", "SMI")] | ||
) | ||
def test_from_file_handles_cls(self, name, file_format): | ||
filename = getattr(file_manager, name) | ||
mol = self.toolkit_wrapper.from_file( | ||
filename, file_format, _cls=SingingMolecule | ||
)[0] | ||
mol.sing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! These molecule subclass tests are wonderful.
…ability support for 't'ext mode files
…s: explicit [H]:s for both toolkits, no header line for RDKit
I, umm, caused the documentation build to fail with "Command killed due to excessive memory consumption"??!? It appears to be in the 'conda env create --quiet --name 1006 --file docs/environment.yml' step, so I don't think it's due to any change I made. |
It doesn't look like OpenFF runs its own doc check? Maybe give it a kick by pushing another commit or closing/reopening this PR. Otherwise, MDAnalysis ran into a similar issue. When the kind people at RTD troubleshooted for us, they found that enabling Edit: the "own doc check" was a bit of a non-sequitur, I just mention it because having both helps rule out provider issues |
Seemingly a transient RTD hiccup. Though I did update releasehistory, there's no way that could have affected things. ... right? ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, @adalke! I think it's in good shape -- There are just two blocking comments. Please feel free to merge once they're resolved!
def test_parse_methane_with_explicit_Hs(self): | ||
mol = self.toolkit_wrapper.from_smiles("[C]([H])([H])([H])([H])") | ||
# add hydrogens | ||
assert mol.n_atoms == 5 | ||
assert mol.n_bonds == 4 | ||
assert molecule.partial_charges is None | ||
|
||
def test_parse_methane_with_explicit_Hs(self): | ||
mol = self.toolkit_wrapper.from_smiles( | ||
"[C]([H])([H])([H])([H])", hydrogens_are_explicit=True | ||
) | ||
# add hydrogens | ||
assert mol.n_atoms == 5 | ||
assert mol.n_bonds == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) I think only one of these methods will end up being defined since they have the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. I used code coverage and found two other similarly shadowed methods, now fixed.
Fixed the test case that wasn't actually being tested.
… shadowing other tests
There are a number of inconsistencies in the RDKit and OpenEye toolkit wrapper interfaces. In particular, the cross-comparison code I developed expects from_file() and from_file_obj() to have the same behavior, which wasn't true for reading SD files with RDKit. See #1005 for the full list.
This adds a new test module,
test_toolkit_api.py
with top-level base classes to handle the test implementation. Each base class contains a derived class for OpenEyeToolkitWrapper and one for RDKitToolkitWrapper.This ensures that both wrappers are tested with exactly the same parameters and checked for the same behaviors.
There's also a new ParseError exception, which inherits from MessageException and ValueError. This is thrown when a SMILES string cannot be parsed. (The previous implementation did not check for parse failures.)
I pulled in the (small number of) SMILES test cases in
test_toolkits.py
but did not change them as I didn't want to risk merge errrors with PR #1004 .I think InChI parsing should also be moved to this new
test_toolkits_io.py
.Lint codebase
Update changelog
Note: I only summarized these as bug fixes. Some of them might be considered as "New features and behaviors changed".