-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] BSS.Convert.toRDKit() appears to lose stereochemical information #330
Comments
Hi there. What's
|
Oops sorry was working with two molecules then cut one out when I wrote this post. Have edited the original post |
I don't think your assertion that the parametrised molecule retains the stereochemistry is correct. If I do the following then everything is fine: import BioSimSpace as BSS
# Create a molecule directly from the SMILES string.
bss_mol = BSS.Convert.smiles("CC[C@@H](O)C")
# Convert to RDKit.
rdkit_mol = BSS.Convert.toRDKit(bss_mol)
# Check the SMILES representation.
print(Chem.MolToSmiles(rdkit_mol))
'[H]O[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])[H]' Converting to RDKit uses the Sire-to-RDKit converter behind the scenes. If required attributes are missing, then stereochemistry is preserved by using an algorithm from MDAnalysis, which we have ported and tested reasonably well. However, in your case you could just simply use a BioSimSpace starting molecule that has been created from SMILES, then the required attributes should be preserved throughout, for example: from rdkit import Chem
import BioSimSpace as BSS
# Define enantiomer
a = 'CC[C@@H](O)C'
# Check stereochemistry of the molecules
a
# Parameterise
bs_a = BSS.Parameters.gaff(BSS.Convert.smiles(a)).getMolecule()
# Check the stereochemistry is retained in the parameterised molecules
bs_a._sire_object.view()
# Convert back to rdkit
rd_a = BSS.Convert.toRDKit(bs_a)
rd_a
# Print the SMILES representation.
print(Chem.MolToSmiles(rd_a))
'[H]O[C@@]([H])(C([H])([H])[H])C([H])([H])C([H])([H])[H]' Note the only difference here is:
This means that the molecule passed in has the required property keys, which are preserved on return, i.e.: smiles_mol = BSS.Convert.smiles(a)
print(smiles_mol._sire_object.propertyKeys())
['chirality',
'hybridization',
'isotope',
'is_aromatic',
'coordinates',
'mass',
'formal_charge',
'connectivity',
'element'] If you just pass in a SMILES string, then these are only attached to an intermediate molecule used in the calculation. I think I might be able to update this so that they are also added. The original SMILES parsing for the parameterisation functions didn't use the internal |
This is now fixed in #331. |
Thanks for looking into this. Regarding parameterised mols not containing stereo-information, how are you able to tell this? When I look at the molecule in 2D (._sire_object.view2d()) it looks like there is a wedge bond. I'm working with SDFs and only used the butanol smiles as an example. Forgive me if I'm misinterpreting your response, or if your fix tackles SDF mols too, but it seems to me that this issue extends beyond smiles parsing. For example: from rdkit import Chem
import BioSimSpace as BSS
smi = 'CC[C@@H](O)C'
# Process smiles in rdkit
mol = Chem.MolFromSmiles(smi)
mol = Chem.AddHs(mol)
Chem.AllChem.EmbedMolecule(mol)
# Write to SDF
Chem.SDWriter('mol.sdf').write(mol)
# Load SDF
bs_mol = BSS.IO.readMolecules('mol.sdf')
# Parameterise
bs_mol_param = BSS.Parameters.gaff(bs_mol.getMolecule(0)).getMolecule()
# Convert to rdkit
rd_conv = BSS.Convert.to(bs_mol_param, "rdkit")
# Check smiles for stereochem
Chem.MolToSmiles(rd_conv)
'[H]OC([H])(C([H])([H])[H])C([H])([H])C([H])([H])[H]' Every chance I'm missing something here but it feels to me like this could be a slightly different issue to what your fix concerns. Cheers |
Hmm, I see. It's strange that things look okay using the # assign stereochemistry to the rest, and also remove
# 3D conformers as they mess up the 2D view
try:
for r in rdkit_mols:
# assign the stereochemistry from the structure
try:
from rdkit.Chem.rdmolops import AssignStereochemistryFrom3D
AssignStereochemistryFrom3D(r)
except Exception:
# does not matter if this fails
pass
# we also don't want any conformers,
# as these mess up the 2D view
r.RemoveAllConformers()
except Exception:
try:
from rdkit.Chem.rdmolops import AssignStereochemistryFrom3D
AssignStereochemistryFrom3D(rdkit_mols)
except Exception:
# does not matter if this fails
pass So, it looks like it's using RDkit to apply stereochemistry to the structure using the Python API. Presumably this isn't done in the converter, which uses the C++ API. (I see no equivalent function call.) No, the fix won't deal with SDF, although SDF attributes should be preserved in the parametrerised molecule and used in the conversion to RDKit. |
In this sense, the view is a bit misleading since the stereochemistry is not part of the BioSimSpace molecule and wouldn't (necessarily) be present when using the Sire-to-RDKit converter. |
Interesting, I wasn't aware of that RDKit function - that's actually very useful. Managed to get around my issues by using it post-conversion, since the 3D information is retained (even if chiral centres weren't being explicitly defined). |
Great, glad it was helpful. |
Is it okay to close this issue? I've implemented the code to preserve the SMILES information in the molecule when that was the source of input to the parametrisation. One thing that we've noticed regarding stereochemistry and the |
Okay that's useful to know, thanks for pointing that out. Original issue is all sorted on my end :) |
Hey again,
I think I've encountered a bug with the Convert module (or I might be overlooking a crucial bit of code). Converting from a BioSimSpace / Sire mol to an rdmol via BSS.Convert.toRDKit() appears to lose stereochemical info.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
It seems to me like there is no reason for the stereochemistry to be removed. The reason this has become an issue for me is because I'm using an rdkit-based flexible alignment method rather than rmsd/rmsdAlign, and so working with rdkit molecules that exhibit correct stereochemical centres is important. Perhaps I've missed something in the docs that'll help me achieve this.
(please complete the following information):
OS: [e.g. Linux (Ubuntu 20.04.6 LTS (Focal Fossa)
Version of Python: Python 3.12.4
Version of BioSimSpace: 2024.2.0
I confirm that I have checked this bug still exists in the latest released version of BioSimSpace: yes
Cheers,
Noah
The text was updated successfully, but these errors were encountered: