-
Notifications
You must be signed in to change notification settings - Fork 37
Description
I hope I'm not missing any capabilities here, but I don't believe the node properties extracted from OBO json files are configurable in this way. Please let me know if I'm incorrect on this!
It would be useful to be able to extract exact synonym information from OBO json files by curie rather than label. Currently, this information is only extracted by grabbing the associated string label, however using the associated CURIES would be helpful! The exact_synonym field could either be replaced with the curies, or another field could be added that includes the curies when available (exact_synonym_curies or something).
Where this code is:
In the ObographSource class, parse_meta function the "hasExactSynonym" node properties are parsed here:
kgx/kgx/source/obograph_source.py
Line 346 in c6db642
| if "synonyms" in meta: |
if "synonyms" in meta:
# parse 'synonyms' as 'synonym'
properties["synonym"] = [s["val"] for s in meta["synonyms"] if "val" in s]
properties["exact_synonym"] = [x['val'] for x in meta["synonyms"] if "pred" in x and x["pred"] == "hasExactSynonym" ]
....
However if the "xrefs" field was used instead of the "val" field, curies could be output in the exact_synonym column of the transform. Synonyms where there is no xref field could be ignored.
proposed change:
if "synonyms" in meta:
# parse 'synonyms' as 'synonym'
properties["synonym"] = [s["val"] for s in meta["synonyms"] if "val" in s]
properties["exact_synonym"] = "|".join([xref for x in meta["synonyms"] if x.get("pred") == "hasExactSynonym" and "xrefs" in x for xref in x["xrefs"]])
....
Below is an example of this:
The source json might look like this (mondo.json example):
{
"id" : "http://purl.obolibrary.org/obo/MONDO_0000001",
"lbl" : "disease",
"type" : "CLASS",
"meta" : {
"definition" : {
"val" : "A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.",
"xrefs" : [ "OGMS:0000031" ]
},
"subsets" : [ "http://purl.obolibrary.org/obo/mondo#ordo_disorder" ],
"synonyms" : [ {
"pred" : "hasExactSynonym",
"val" : "condition",
"xrefs" : [ "NCIT:C2991" ]
}, {
"pred" : "hasExactSynonym",
"val" : "disease",
"xrefs" : [ "DOID:4", "NCIT:C2991", "Orphanet:377788" ]
},
{
"pred" : "hasExactSynonym",
"val" : "medical condition"
},
The resulting transform (current):
id category name description xref provided_by synonym exact_synonym broad_synonym narrow_synonym related_synonym deprecated iri same_as subsets
MONDO:0000001 biolink:Disease disease A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism. DOID:4|ICD9:799.9|MEDGEN:4347|MESH:D004194|NCIT:C2991|OGMS:0000031|Orphanet:377788|SCTID:64572001|UMLS:C0012634 mondo.json condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease http://purl.obolibrary.org/obo/MONDO_0000001 DOID:4|NCIT:C2991|Orphanet:377788|UMLS:C0012634|http://identifiers.org/medgen/4347|http://identifiers.org/mesh/D004194|http://identifiers.org/snomedct/64572001 ordo_disorder
The proposed transform (see the exact_synonym column):
id category name description xref provided_by synonym exact_synonym broad_synonym narrow_synonym related_synonym deprecated iri same_as subsets
MONDO:0000001 biolink:Disease disease A disease is a disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism. DOID:4|ICD9:799.9|MEDGEN:4347|MESH:D004194|NCIT:C2991|OGMS:0000031|Orphanet:377788|SCTID:64572001|UMLS:C0012634 mondo.json condition|disease|disease or disorder|disease or disorder, non-neoplastic|diseases|diseases and disorders|disorder|disorders|medical condition|other disease NCIT:C2991|DOID:4|NCIT:C2991|Orphanet:377788|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991|NCIT:C2991 http://purl.obolibrary.org/obo/MONDO_0000001 DOID:4|NCIT:C2991|Orphanet:377788|UMLS:C0012634|http://identifiers.org/medgen/4347|http://identifiers.org/mesh/D004194|http://identifiers.org/snomedct/64572001 ordo_disorder