Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chemical pages #2200

Open
ValWood opened this issue Jul 31, 2024 · 3 comments
Open

Chemical pages #2200

ValWood opened this issue Jul 31, 2024 · 3 comments

Comments

@ValWood
Copy link
Member

ValWood commented Jul 31, 2024

We would like chemical pages for reaction participants.

For example,
Screenshot 2024-07-31 at 19 25 56

chemicals link to CHEBI. Instead we would like to link to a chemical page which list all of the genes + reactions that the enzyme participates in (I haven't thought this through fully).

We also need to exclude what we refer to as "currency chemicals" for example
phosphate,
diphosphate
H+
H2O
(where the pages would be ridiculously long).

The page would display something like this:

Screenshot 2024-07-31 at 19 30 31

but with a gene instead of a Rhea ID.

This would be incredibly useful to:

  • quickly identify the next steps in pathways
  • more as we begin to chase up random chemicals that are not dealt with as parts of pathway curation (especially to identify specific detoxification candidates among the "unknown" proteins).
  • This will also be very useful to identify when genes are branch points in pathways, we should be able to see quickly if a pathway is linear or at a branch point with multiple options. This will help us with decisions about 'starts' and 'ends' of pathways and where to break up GO-CAM models.

I guess the first step is to store the chemicals in Chado if we don't do that yet?

@kimrutherford
Copy link
Member

kimrutherford commented Aug 1, 2024

I guess the first step is to store the chemicals in Chado if we don't do that yet?

We currently only store the chemicals from ChEBI that are in: pombe-embl/mini-ontologies/chebi.obo
The full ChEBI is 200,000 terms and we only use a handful.

I've had a first look at the Rhea downloads page. I can't see a file the contains a simple mapping between the Rhea IDs and the ChEBI IDs. There are RDF files but probably using the SPARQL endpoint would work: https://sparql.rhea-db.org/sparql/#

Something like this:

PREFIX ch:<http://purl.obolibrary.org/obo/>

SELECT distinct ?chebi ?reaction
WHERE {
  ?reaction rdfs:subClassOf rh:Reaction .
  ?reaction rh:status rh:Approved .
  ?reaction rh:side ?reactionSide .
  ?reactionSide rh:contains ?participant .
  ?participant rh:compound ?compound .
  ?compound rh:chebi ?chebi .
  ?reaction rh:id "39051"^^xsd:long .

}
ORDER BY ?chebi

It will be slow to do many queries but it won't need updating every night.


Note to self:

curl 'https://sparql.rhea-db.org/sparql/' -X POST -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8' -H 'Accept-Language: en-NZ,en;q=0.7,en-US;q=0.3' -H 'Accept-Encoding: gzip, deflate, br, zstd' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Origin: null' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: same-origin' -H 'Sec-Fetch-User: ?1' -H 'Sec-GPC: 1' -H 'Priority: u=0, i' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data-raw 'query=PREFIX+rh%3A+%3Chttp%3A%2F%2Frdf.rhea-db.org%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+CHEBI%3A+%3Chttp%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_%3E%0D%0A%0D%0A%23endpoint%3Ahttps%3A%2F%2Fsparql.rhea-db.org%2Fsparql%0D%0A%23query+Q3%3A+Select+all+approved+reactions+with+CHEBI%3A17815+%28a+1%2C2-diacyl-sn-glycerol%29+or+one+of+its+descendant%0D%0A%0D%0A%0D%0APREFIX+ch%3A%3Chttp%3A%2F%2Fpurl.obolibrary.org%2Fobo%2F%3E%0D%0A%0D%0A%0D%0ASELECT+distinct+%3Fchebi+%3Freaction%0D%0AWHERE+%7B%0D%0A++%3Freaction+rdfs%3AsubClassOf+rh%3AReaction+.%0D%0A++%3Freaction+rh%3Astatus+rh%3AApproved+.%0D%0A++%3Freaction+rh%3Aside+%3FreactionSide+.%0D%0A++%3FreactionSide+rh%3Acontains+%3Fparticipant+.%0D%0A++%3Fparticipant+rh%3Acompound+%3Fcompound+.%0D%0A++%3Fcompound+rh%3Achebi+%3Fchebi+.%0D%0A++%3Freaction+rh%3Aid+%2239051%22%5E%5Exsd%3Along+.%0D%0A%0D%0A%7D%0D%0AORDER+BY+%3Fchebi&format=csv' > result.csv

@kimrutherford
Copy link
Member

chemicals link to CHEBI. Instead we would like to link to a chemical page

We'll need help from the widget developers for that because the widget code inserts the links to ChEBI automatically.

Would it be useful/necessary to search from chemicals with the quick search or in the query builder?

Currently the only reaction details we store are the Rhea IDs, which are dbxrefs for GO terms in Chado. We can get more reaction details using SPARQL queries but I'm not sure of the best way to store the details in Chado (or if we should store the details in Chado at all).

Chemical pages could be a bit of work. We won't be able to re-use much of the code from other pages because the table on the chemical pages with be unique to the new pages.

but with a gene instead of a Rhea ID.

It could be multiple genes, I think, like in this case: https://www.pombase.org/term/GO:0004845

Would the Rhea ID and GO term ID(s) be a good idea in the table too?

Should we show the reaction diagrams on the chemical pages?

@ValWood
Copy link
Member Author

ValWood commented Aug 1, 2024

Lets discuss on next group call and see if there are any ways to make this easier.
I'll think more about exactly what we need.

@kimrutherford kimrutherford self-assigned this Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants