Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obsolete/Deprecated property is not populated when converting from obojson #470

Closed
kevinschaper opened this issue Oct 24, 2023 · 15 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@kevinschaper
Copy link
Collaborator

kevinschaper commented Oct 24, 2023

Looking at kg-phenio and kg-obo, (and digging into the kg-phenio build), it looks like kgx doesn't produce a deprecated column when converting from obojson.

Related to / inspired by: monarch-initiative/monarch-app#421

@kevinschaper kevinschaper added the bug Something isn't working label Oct 24, 2023
@caufieldjh
Copy link
Collaborator

Looks like it's parsed from obograph:

if "deprecated" in meta:
# parse 'deprecated' flag
properties["deprecated"] = meta["deprecated"]

but the equivalent JSON and OWL parsers don't explicitly parse the field.

@kevinschaper
Copy link
Collaborator Author

@caufieldjh It seems like that means that there isn't exactly a bug in kgx, but we do need to figure out how to pass the node properties into

            transform(
                inputs=[data_file_json],
                input_format="obojson",
                output=os.path.join(self.output_dir, name),
                output_format="tsv",
                stream=False,
            )

Looking at the method declaration, it seems like transform_config is the only way, but since we don't have a config yaml currently, maybe we'd want to add the option to pass in node_properties?

def transform(
    inputs: Optional[List[str]],
    input_format: Optional[str] = None,
    input_compression: Optional[str] = None,
    output: Optional[str] = None,
    output_format: Optional[str] = None,
    output_compression: Optional[str] = None,
    stream: bool = False,
    node_filters: Optional[List[Tuple[str, str]]] = None,
    edge_filters: Optional[List[Tuple[str, str]]] = None,
    transform_config: str = None,
    source: Optional[List] = None,
    knowledge_sources: Optional[List[Tuple[str, str]]] = None,
    # this parameter doesn't get used, but I leave it in
    # for now, in case it signifies an unimplemented concept
    # destination: Optional[List] = None,
    processes: int = 1,
    infores_catalog: Optional[str] = None,
)

@caufieldjh
Copy link
Collaborator

I agree - just need the extra param for the transform function, though this does suggest that using a config yaml instead may be a good idea. Is that feasible for the Monarch builds?

@kevinschaper
Copy link
Collaborator Author

The transform call I pasted in above is from kg-phenio, I think that's where we'd need to add config yaml

@caufieldjh
Copy link
Collaborator

Ah, well in that case it should be easy to test. I'll open an issue over there and start a PR.

@caufieldjh
Copy link
Collaborator

It also looks like the way the deprecated flag is parsed in obograph_source misses many potential instances, since it only looks at meta. If I go from owl -> json -> tsv (as in kg-phenio or other kgs) then this appears to miss the majority of the deprecated flags. They just don't end up in the output tsv nodes.

@RichardBruskiewich
Copy link
Collaborator

RichardBruskiewich commented Nov 7, 2023

I think this issue may now be partially resolved by my PR #475 - which is still a draft PR for the moment.

However, it may be helpful @caufieldjh and @kevinschaper to have you give me a small sample file of the source data from which you feel KGX is not propagating the 'deprecated' flag.

Better yet, if you can, please send me a small snippet of Python KGX transform code which resembles the code what you are currently doing in your (phenio?) code base.

In that way, I can use both the sample data and your snippet to add another unit test for the owl -> json -> tsv transitive transformation, then use it to check if there are any other loose ends in the KGX code that need fixing.

@caufieldjh
Copy link
Collaborator

Thanks @RichardBruskiewich !
The source data (from phenio.json) looks like this:

 {
      "id" : "http://purl.obolibrary.org/obo/GO_0051370",
      "lbl" : "obsolete ZASP binding",
      "type" : "INDIVIDUAL",
      "meta" : {
        "definition" : {
          "val" : "OBSOLETE. Binding to Z-band alternatively spliced PDZ motif protein (ZASP). ZASP is a Z-band protein specifically expressed in heart and skeletal muscle. This protein contains N-terminal PDZ domain and C-terminal LIM domain.",
          "xrefs" : [ "PMID:10427098", "PMID:11699871" ]
        },
        "comments" : [ "This term was made obsolete because it represents binding to an individual protein." ],
        "synonyms" : [ {
          "pred" : "hasExactSynonym",
          "val" : "Z-band alternatively spliced PDZ-motif protein binding"
        }, {
          "pred" : "hasExactSynonym",
          "val" : "ZASP binding"
        } ],
        "basicPropertyValues" : [ {
          "pred" : "http://purl.obolibrary.org/obo/IAO_0100001",
          "val" : "GO:0008092"
        }, {
          "pred" : "http://www.geneontology.org/formats/oboInOwl#hasOBONamespace",
          "val" : "molecular_function"
        } ],
        "deprecated" : true
      }

@RichardBruskiewich
Copy link
Collaborator

RichardBruskiewich commented Nov 7, 2023

I assume that the phenio.json data is normally properly wrapped within a suitable KGX-like objson-like wrapper, something like:

{
  "graphs": [
    {
      "nodes": [
        {
          "id": "http://purl.obolibrary.org/obo/GO_0051370",
          "lbl": "obsolete ZASP binding",
          ...etc,
            "deprecated": true
          }
        }
      ],
      "edges": [],
      "id": "http://purl.obolibrary.org/obo/phenio.owl",
      "meta": {
        "subsets": [],
        "xrefs": [],
        "basicPropertyValues": []
      }
    }
  ]
}

Using this basic structure, I confirm that the obograph source parser detects the deprecated property in the data.

I will now iterate towards a unit test to detect transitive conservation of the deprecated status across various transformations.

@caufieldjh
Copy link
Collaborator

I assume that the phenio.json data is normally properly wrapped within a suitable KGX-like objson-like wrapper, something like:

Yes, it is - apologies for the rough snippet

@RichardBruskiewich
Copy link
Collaborator

Hi @caufieldjh and @kevinschaper , I think the PR #475 is close to completed. I added a unit test along the lines of the 'transform' snippet above, with the phenio.json, and it seems to pass.

I suppose I also now need to check the owl <-> (json or tsv) path too?

@RichardBruskiewich
Copy link
Collaborator

Fully resolved by #475 which is now released in KGX release v2.2.5

@caufieldjh
Copy link
Collaborator

Thanks again @RichardBruskiewich

@RichardBruskiewich
Copy link
Collaborator

RichardBruskiewich commented Nov 14, 2023

Mainstream use of 'deprecate' flag on instances (not classes) of biolink:Entity now supported by PR biolink/biolink-model#1421; now supported in Biolink Model release 3.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants