Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version pins to various pyobo calls #169

Open
kkaris opened this issue Jun 3, 2024 · 4 comments
Open

Add version pins to various pyobo calls #169

kkaris opened this issue Jun 3, 2024 · 4 comments

Comments

@kkaris
Copy link
Collaborator

kkaris commented Jun 3, 2024

I recently ran into a timeout when testing one of the frontend apps at discovery.indra.bio and saw on the backend that the issue was that there were new files being downloaded for pyobo. To resolve this, we can add version pins for various pyobo calls wherever they show up so that there are no downloads triggered at runtime when calls to the various apps come in.

See also: biopragmatics/pyobo#181 and biopragmatics/pyobo#184.

@bgyori
Copy link
Member

bgyori commented Jun 3, 2024

Could you check which resources specifically are implicated?

@kkaris
Copy link
Collaborator Author

kkaris commented Jun 3, 2024

The one I saw in my timeout had to do with ec-codes:

INFO: [2024-06-03 18:15:51] pystow.utils - downloading with urllib from ftp://ftp.expasy.org/databases/enzyme/enzclass.txt to /root/.data/pyobo/raw/eccode/2024-05-29/enzclass.txt
INFO: [2024-06-03 18:15:53] pystow.utils - downloading with urllib from ftp://ftp.expasy.org/databases/enzyme/enzyme.dat to /root/.data/pyobo/raw/eccode/2024-05-29/enzyme.dat
INFO: [2024-06-03 18:15:55] pystow.utils - downloading with urllib from http://current.geneontology.org/ontology/external2go/ec2go to /root/.data/pyobo/raw/eccode/2024-05-29/ec2go.tsv

I'll get a list of all resources that are implicated.

@kkaris
Copy link
Collaborator Author

kkaris commented Jun 3, 2024

I'm excluding pyobo calls that are in processors, as they are not used when serving the rest api for the discovery apps. I found two instances:

@kkaris
Copy link
Collaborator Author

kkaris commented Jun 3, 2024

Re the EC-codes: The HGNCEnzymeProcessor actually uses the bioontology, so we could either:

  1. Switch the pyobo name lookups in client/enrichment/mla.py to bio-ontology calls instead or
  2. Switch the bioontology call in the HGNCEnzymeProcessor to pyobo calls
  3. Just avoid the name lookup altogether in client/enrichment/mla.py by modifying the query to get the name as well from the node

I think option 3 makes the most sense, then we always stay consistent with the data in the database.
There are two use cases there: a) Getting names from ec-codes that exists in CoGEx and b) Getting hgnc ids, translate them to ec-codes, then get the name. In a) we can replace the lookup by simply querying for the name as well, but in b) we still need to get the name. In this case I think option 2 above is the way to go, since that's what we use to create the DB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants