Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional adduct forms for crest #2

Closed
tobigithub opened this issue Feb 18, 2020 · 5 comments
Closed

Additional adduct forms for crest #2

tobigithub opened this issue Feb 18, 2020 · 5 comments

Comments

@tobigithub
Copy link

Is your feature request related to a problem? Please describe.
The current crest version (Version 2.8.1, Fri 20. Dec 13:44:46 CET 2019 ) only allows for protonized [M+H]+ and deprotonized adduct forms [M-H]-. While these are the most important adducts electrospray mass spectrometry (ESI-MS) they are only two out of 300 different possible ions that are routinely observed. In the past many of these other adducts were simply ignored or found less important. With the advent of accurate mass spectrometry, many experiments and software solutions now report all additional species.

Describe the solution you'd like
A crest version that allows for additional adduct ions selected in the command line or in the parameter file, or input file.

Describe alternatives you've considered
None.

If possible state how you can assist in providing data or code to to implement the feature
Can provide list of most common adducts, for ESI-MS based on thousands of spectra.

Additional context
Maybe starting with the most common ones in the list, that do not require additional complex coding would be good. Basically replacing {H} with {Na} or similar. That means, complicated species with multiple ions or water losses, could be ignored in the beginning, but those with [M+Na]+, [M+Li]+ or [M.Cl]- or [M+NH4]+ could be easily added. The list contains 300 adducts, I only added the top 30.

| #  | Adduct name  | Count | Percent  |   |
|----|--------------|-------|----------|---|
| 1  | [M+H]+       | 98521 | 62.55381 |   |
| 2  | [M+2H]2+     | 18025 | 11.44459 |   |
| 3  | [M+H-H2O]+   | 13822 | 8.77598  |   |
| 4  | [M-H]-       | 9847  | 6.25214  |   |
| 5  | [M+Na]+      | 8679  | 5.51055  |   |
| 6  | [M+H-NH3]+   | 1882  | 1.19494  |   |
| 7  | [M+NH4]+     | 1161  | 0.73715  |   |
| 8  | [M-H-H2O]-   | 545   | 0.34604  |   |
| 9  | [M-H+2Na]+   | 519   | 0.32953  |   |
| 10 | [M-H+H2O]-   | 386   | 0.24508  |   |
| 11 | [M+NH4-H2O]+ | 362   | 0.22984  |   |
| 12 | [M+H+H2O]+   | 306   | 0.19429  |   |
| 13 | [M+H+Na]2+   | 288   | 0.18286  |   |
| 14 | [M+H+K]2+    | 276   | 0.17524  |   |
| 15 | [M-2H]2-     | 220   | 0.13968  |   |
| 16 | [M+2Na]2+    | 217   | 0.13778  |   |
| 17 | [M+2H-NH3]2+ | 216   | 0.13714  |   |
| 18 | [M+K]+       | 215   | 0.13651  |   |
| 19 | [M+H-2H2O]+  | 186   | 0.11810  |   |
| 20 | [M+3H]3+     | 105   | 0.06667  |   |
| 21 | [M+2H-H2O]2+ | 102   | 0.06476  |   |
| 22 | [M]+.        | 93    | 0.05905  |   |
| 23 | [M+2Na-H]+   | 81    | 0.05143  |   |
| 24 | [M-H+2K]+    | 80    | 0.05079  |   |
| 25 | [M+H-CO]+    | 73    | 0.04635  |   |
| 26 | [M+H-CO2]+   | 68    | 0.04318  |   |
| 27 | [M+H-CH2O2]+ | 60    | 0.03810  |   |
| 28 | [M-H-NH3]-   | 59    | 0.03746  |   |
| 29 | [M.Cl]-      | 56    | 0.03556  |   |
| 30 | [M+Li]+      | 49    | 0.03111  |   |
@awvwgk
Copy link
Member

awvwgk commented Feb 18, 2020

Thanks, this looks interesting. Everything based on cationic species (H, alkaline ions) should be easily doable with crest or is already possible (15 of 30 species in the above list). This might require performing multiple protonation and/or deprotonation steps and maybe removing an electron by hand (22), but it is doable.

Most of the leftover species are results of fragmentations or dissociations, which are, to my knowledge, removed explicitly from the protonation/deprotonation procedure right now. So they might already get generated in the MTD step, but are sorted out due to fragmentation.

Generating one ensemble of ESI-MS protomers is somewhat difficult as we would be dealing with a grand-canonical ensemble, but I fail to see an easy way to allow particle exchange in such a search procedure. I guess generating it in pieces and putting everything together in the end would be the way to go.

@pprcht
Copy link
Contributor

pprcht commented Feb 19, 2020

Currently adding other ions is possible if they consist only out of a single atom, e.g., alkaline or earth-alkaline ions. To do that there is an additional cmd flag that has to be used together with the '-protonate' command. This flag is called '-swel' (short for 'switch element') and requires the corresponding ion including its charge as a second argument. I.e., something like '-swel na+' or '-swel mg2+' will be read and parsed.
There is one example of this in the main publication (https://pubs.rsc.org/en/content/articlehtml/2020/cp/c9cp06869d).

Adding more complex ions consisting out of several atoms, e.g., something like NH4+, is currently not possible in an automated way.

@tobigithub
Copy link
Author

tobigithub commented Feb 20, 2020

@pprcht @awvwgk
great, thank you, that works fine.

When I use it on the crest xtb example from alanylglycine I nicely get the sodiated adduct.

image

>cat crest_best.xyz
  20
        -33.86165696
 C         -2.1627314617       -0.1093305049       -0.3179184038
 C         -3.0386798794        0.1551215621        0.9094452568
 H         -2.5609217993       -0.2489766103        1.8005935960
 H         -3.1615103731        1.2257149903        1.0437787263
 H         -4.0170366784       -0.3067699909        0.7916623873
 N         -1.8297221481       -1.5125490956       -0.5338860442
 H         -1.4019637414       -1.8949461549        0.3054021711
 H         -2.6723252934       -2.0471788886       -0.7152017160
 H         -2.6862009673        0.2430998080       -1.2167434450
 C         -0.8715535551        0.7118857591       -0.2176704482
 O         -0.8068736164        1.7904607650        0.3323591016
 N          0.1704837182        0.1380148853       -0.8597669072
 H          0.0313935031       -0.8169399754       -1.1652573537
 C          1.5097913329        0.6300911148       -0.7040998402
 C          2.3317937909       -0.2766511735        0.1917997439
 O          1.9237529964       -1.2763293930        0.7161968811
 O          3.5851947016        0.1635610032        0.3244267958
 H          4.0943032987       -0.4202945400        0.9094608424
 H          1.4441575213        1.6202042580       -0.2389209338
 H          2.0163379624        0.7297480352       -1.6694825308



>crest crest_best.xyz -protonate -T 16  -ewin 10000 -iter 1000 -swel na+

===================================================
============= ordered structure list [Na]+ ========
===================================================
 written to file <protonated.xyz>

 structure    ΔE(kcal/mol)   Etot(Eh)
    1            0.00        -33.773504
    2            0.63        -33.772494
    3            1.72        -33.770764
    4           10.16        -33.757305


>crest crest_best.xyz -protonate -T 16  -ewin 10000 -iter 1000

===================================================
============= ordered structure list [H]+ =========
===================================================
 written to file <protonated.xyz>

 structure    ΔE(kcal/mol)   Etot(Eh)
    1            0.00        -33.960179
    2            0.83        -33.958853
    3            3.06        -33.955296
    4           22.96        -33.923597
    5           27.49        -33.916373
    6           55.14        -33.872305

Classic tools also generate additional protomers or adducts in this case which could have even lower energy. I have to check which molecules are thrown out during the three cycles. Maybe that's when rule based system and AIMD (similar to this paper) have to be married or at least considered for some cases. Basically feeding the rule based output into crest.

image

Source: https://cactus.nci.nih.gov/tautomerizer/
Input: "CC@HC(\NCC(O)=O)=[O+]/[Na]"

Plus spectroscopic methods or mass spectrometry can be used to confirm or investigate some of these compounds (similar to this paper) where tautomers of different dipeptides were investigated with tandem mass spectrometry. The referenced paper also shows possible ring formations.

Overall I think crest is a "killer tool" which really will improve productivity, because who has time to test hundreds of different possibilities, especially for larger molecules. Plus CREGEN gives us ensemble outputs, really love it!

@awvwgk awvwgk transferred this issue from grimme-lab/xtb Apr 23, 2020
@lizhiq16

This comment has been minimized.

@awvwgk

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants