Skip to content

Use "xrefs" field with curies rather than "val" field for names when getting exactSynonyms from json files #530

@bsantansb

Description

@bsantansb

I hope I'm not missing any capabilities here, but I don't believe the node properties extracted from OBO json files are configurable in this way. Please let me know if I'm incorrect on this!

It would be useful to be able to extract exact synonym information from OBO json files by curie rather than label. Currently, this information is only extracted by grabbing the associated string label, however using the associated CURIES would be helpful! The exact_synonym field could either be replaced with the curies, or another field could be added that includes the curies when available (exact_synonym_curies or something).

Where this code is:
In the ObographSource class, parse_meta function the "hasExactSynonym" node properties are parsed here:

if "synonyms" in meta:

if "synonyms" in meta:
            # parse 'synonyms' as 'synonym'
            properties["synonym"] = [s["val"] for s in meta["synonyms"] if "val" in s]
            properties["exact_synonym"] = [x['val'] for x in meta["synonyms"] if "pred" in x and x["pred"] == "hasExactSynonym" ]
....

However if the "xrefs" field was used instead of the "val" field, curies could be output in the exact_synonym column of the transform. Synonyms where there is no xref field could be ignored.

proposed change:

if "synonyms" in meta:
            # parse 'synonyms' as 'synonym'
            properties["synonym"] = [s["val"] for s in meta["synonyms"] if "val" in s]
            properties["exact_synonym"] = "|".join([xref for x in meta["synonyms"] if x.get("pred") == "hasExactSynonym" and "xrefs" in x for xref in x["xrefs"]])
....

Below is an example of this:

The source json might look like this (mondo.json example):

{
      "id" : "http://purl.obolibrary.org/obo/MONDO_0000001",
      "lbl" : "disease",
      "type" : "CLASS",
      "meta" : {
        "definition" : {
          "val" : "A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.",
          "xrefs" : [ "OGMS:0000031" ]
        },
        "subsets" : [ "http://purl.obolibrary.org/obo/mondo#ordo_disorder" ],
        "synonyms" : [ {
          "pred" : "hasExactSynonym",
          "val" : "condition",
          "xrefs" : [ "NCIT:C2991" ]
        }, {
          "pred" : "hasExactSynonym",
          "val" : "disease",
          "xrefs" : [ "DOID:4", "NCIT:C2991", "Orphanet:377788" ]
        }, 
        {
          "pred" : "hasExactSynonym",
          "val" : "medical condition"
        },

The resulting transform (current):

id	category	name	description	xref	provided_by	synonym	exact_synonym	broad_synonym	narrow_synonym	related_synonym	deprecated	iri	same_as	subsets
MONDO:0000001	biolink:Disease	disease	A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.	DOID:4|ICD9:799.9|MEDGEN:4347|MESH:D004194|NCIT:C2991|OGMS:0000031|Orphanet:377788|SCTID:64572001|UMLS:C0012634	mondo.json	condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease	condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease					http://purl.obolibrary.org/obo/MONDO_0000001	DOID:4|NCIT:C2991|Orphanet:377788|UMLS:C0012634|http://identifiers.org/medgen/4347|http://identifiers.org/mesh/D004194|http://identifiers.org/snomedct/64572001	ordo_disorder

The proposed transform (see the exact_synonym column):

id	category	name	description	xref	provided_by	synonym	exact_synonym	broad_synonym	narrow_synonym	related_synonym	deprecated	iri	same_as	subsets
MONDO:0000001	biolink:Disease	disease	A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.	DOID:4|ICD9:799.9|MEDGEN:4347|MESH:D004194|NCIT:C2991|OGMS:0000031|Orphanet:377788|SCTID:64572001|UMLS:C0012634	mondo.json	condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease	NCIT:C2991|DOID:4|NCIT:C2991|Orphanet:377788|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991					http://purl.obolibrary.org/obo/MONDO_0000001	DOID:4|NCIT:C2991|Orphanet:377788|UMLS:C0012634|http://identifiers.org/medgen/4347|http://identifiers.org/mesh/D004194|http://identifiers.org/snomedct/64572001	ordo_disorder

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions