Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is typing and sub-partitions handled #65

Open
cmungall opened this issue May 28, 2021 · 0 comments
Open

How is typing and sub-partitions handled #65

cmungall opened this issue May 28, 2021 · 0 comments

Comments

@cmungall
Copy link
Contributor

What is the policy on having different types within a namespace, and having these be partitioned?

There seem to be some decisions inherited from identifiers.org, for example, using the . to partition, e.g. DATABASE.TYPE

For better or worse, these seem locked in so I would elevate this from convention to recommendation.

E.g. a NS MAY be partitioned, if it is partitioned, the . MUST be used, and a . MUST not be used in a NS except to partition. The authority for the partitioned NS must be the same as the base NS

I think some of these should be actively reviewed. The subnamespacing convention can lead to different ways to write the same ID. To be honest, many of these were created simply as workarounds for legacy databases that had issues with providing a single URL to resolve all types within that database.

For example, there are two ways to write the same dictyBase gene

GO uses dictyBase:DDB_G0282931. E.g. http://amigo.geneontology.org/amigo/gene_product/dictyBase:DDB_G0282931

This was agreed on with the maintainers of that database (@cybersiddhu).

However, identifiers.org registered both "dictybase" as well as "dictybase.gene" and "dictybase.est". The base "dictybase" doesn't seem to be useful and if I click on the example on the bioregistry page I get

http://bioregistry.io/reference/dictybase:DDB0191090

image

Note that the GO registry may be responsible for some confusion here. We used to have a lot of entries where we dealt with this problem by having sub-patterns within a single namespace:

- database: IntAct
  name: IntAct protein interaction database
  rdf_uri_prefix: http://identifiers.org/intact/
  generic_urls:
    - https://www.ebi.ac.uk/intact/
  entity_types:
    - type_name: molecular interaction
      type_id: BET:0000013
      id_syntax: EBI-[0-9]+
      url_syntax: https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=[example_id]
      example_id: IntAct:EBI-17086
      example_url: https://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-17086      
    - type_name: protein-containing complex
      type_id: GO:0032991
      id_syntax: EBI-[0-9]+
      url_syntax: http://www.ebi.ac.uk/complexportal/complex/[example_id]
      example_id: IntAct:EBI-10205244
      example_url: http://www.ebi.ac.uk/complexportal/complex/EBI-10205244

however, we gradually started splitting some of these out to be consistent with identifiers.org, but we had previously used underscore as separator, e.g.

- database: KEGG
  name: Kyoto Encyclopedia of Genes and Genomes
  generic_urls:
    - http://www.genome.ad.jp/kegg/
  entity_types:
    - type_name: entity
      type_id: BET:0000000
- database: KEGG_ENZYME
  name: KEGG Enzyme Database
  rdf_uri_prefix: http://identifiers.org/kegg.enzyme/
  generic_urls:
    - http://www.genome.jp/dbget-bin/www_bfind?enzyme
  entity_types:
    - type_name: entity
      type_id: BET:0000000
      id_syntax: \d(\.\d{1,2}){2}\.\d{1,3}
      url_syntax: http://www.genome.jp/dbget-bin/www_bget?ec:[example_id]
      example_id: KEGG_ENZYME:2.1.1.4
      example_url: http://www.genome.jp/dbget-bin/www_bget?ec:2.1.1.4
- database: KEGG_LIGAND
  name: KEGG LIGAND Database
  generic_urls:
    - http://www.genome.ad.jp/kegg/docs/upd_ligand.html
  entity_types:
    - type_name: chemical entity
      type_id: CHEBI:24431
      id_syntax: C\d{5}
      url_syntax: http://www.genome.jp/dbget-bin/www_bget?cpd:[example_id]
      example_id: KEGG_LIGAND:C00577
      example_url: http://www.genome.jp/dbget-bin/www_bget?cpd:C00577
- database: KEGG_PATHWAY
....

Another category is where the namespace and subnamespace are smoothed together e.g

http://bioregistry.io/registry/ncbigene
http://bioregistry.io/registry/ncbitaxon

I have a few recommendations on how to proceed but I wanted to start just by laying out some of the heterogeneity here and propose that we are proactive here and propose a robust system rather than being tied to arbitrary decisions based on what a CGI script in 2001 could or could not do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant