Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching compound names from GNPS to MiBIG #3

Closed
sdrogers opened this issue Oct 25, 2018 · 9 comments
Closed

Matching compound names from GNPS to MiBIG #3

sdrogers opened this issue Oct 25, 2018 · 9 comments
Labels
data-format issues related to format of data

Comments

@sdrogers
Copy link
Contributor

This is a bit of a mess (and will be always I think).
We could store a dictionary of the names that appear in GNPS (LibraryID) which seem to be slightly different from the ones that are in the GNPS library MGF files.
Also need to be able to handle the dereplicator results that appear in the LibraryID column and dereplicator results from dereplicator (should be cleaner)
Also, varquest...etc

@justinjjvanderhooft
Copy link

Indeed, over the coming months I will try to validate links within the iOMEGA project but it will remain tricky indeed. Also, I can add SMILES to GNPS library IDs but will need to find some time to do it as we will need to double check the identifications....

1 similar comment
@justinjjvanderhooft
Copy link

Indeed, over the coming months I will try to validate links within the iOMEGA project but it will remain tricky indeed. Also, I can add SMILES to GNPS library IDs but will need to find some time to do it as we will need to double check the identifications....

@sdrogers
Copy link
Contributor Author

It seems though that GNPS doesn't necessarily provide the "IDs" - in the Crusemann file I'm working from it has compound names...IDs would be more helpful.

@sdrogers
Copy link
Contributor Author

Also, Inchikey maybe better than smiles (or both)

@justinjjvanderhooft
Copy link

What do you mean with IDs? I meant the compound names but they are not always completely unambiguous....

@sdrogers
Copy link
Contributor Author

The gnps library spectra have official IDs (CCMSXXXXXXXX) but this isn't in the output. In the output is the name "Staurosporine" which is also not identical to the names in the GNPS library MGF file (where the adduct is also present Staurosporine M+H or something

@sdrogers
Copy link
Contributor Author

I suspect we'll end up with a method that just tries lots of ways of comparing the two names together

@justinjjvanderhooft
Copy link

Got it - you are right. Better to communicate with InchiKeys and SMILES/SMARTS....

@CunliangGeng
Copy link
Member

I assume this issue has been solved, please reopen it if not.

@CunliangGeng CunliangGeng added the data-format issues related to format of data label Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-format issues related to format of data
Projects
None yet
Development

No branches or pull requests

3 participants