Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
1b5a3c5
Changing DugElement, DugConcept, adding DugVariable
Apr 15, 2025
4feeb97
Changing the HEAL parser
Apr 23, 2025
d8559cd
Changes to crawler
May 12, 2025
7650e93
Adding study data type
May 12, 2025
5d679cf
Testing parser HEAL parser
May 12, 2025
d7270c1
ENH: Changing indexing, annotator
Jun 3, 2025
66de6be
TEST: Fixing loader test
Jun 3, 2025
988aa28
Making indexing work on one file
Jun 23, 2025
d8b34b7
Correcting a few errors and cleanup
Jun 25, 2025
2319c7c
Make indices configurable through .env
Jun 25, 2025
1f1ae76
Merge branch 'DugModel2.0' into data-model-update
Jun 25, 2025
6b7fd8a
Merge DugDataModel2.0
Jun 25, 2025
72415de
Updating indices to a dict
Jun 25, 2025
3f76749
Cleaning print statements
Jun 25, 2025
4de2efd
Adjusting tests to code changes
Jun 26, 2025
dd08019
FEAT: Adding a parser for HEAL studies to get data from MDS
Jun 27, 2025
687bf0b
Changes for studies annotation and index
Jun 29, 2025
00b2a50
Added /variables and /concepts endpoints that use new indexes
vladimir2217 Jul 9, 2025
99c5f30
Adding DugSection element
Jul 10, 2025
2058131
Merge pull request #400 from helxplatform/data-model-update-api
hina-shah Jul 10, 2025
565eec8
Updating HEAL Parser to get DugSections
Jul 14, 2025
6218a68
Merge branch 'data-model-update' of github.com:helxplatform/dug into …
Jul 14, 2025
6dbf8fb
Adding Sections Index to the pipeline
Jul 14, 2025
8260af8
CLEANUP
Jul 14, 2025
5056c82
Fixed missing new model fields
vladimir2217 Jul 15, 2025
715bde9
Changing search to use index dictionary
Jul 16, 2025
811a8a8
Added new Studues API
vladimir2217 Jul 24, 2025
ca1a6e7
added import
vladimir2217 Jul 24, 2025
9510344
Added comments to new APIs
vladimir2217 Jul 25, 2025
3440ef0
Added cde endpoint. Added study_sources endpoint
vladimir2217 Aug 6, 2025
f6de20c
Added SearchQuery parameter. Changed API to handle post. Added get_va…
vladimir2217 Aug 7, 2025
8e39beb
Merge pull request #401 from helxplatform/data-model-update-api
vladimir2217 Aug 7, 2025
2a4f93c
Merge pull request #402 from helxplatform/data-model-update-studies-api
YaphetKG Aug 7, 2025
ffab250
Standardiszing response types, some minor edits to search
YaphetKG Aug 7, 2025
713bd80
Standardiszing response types, some minor edits to search
YaphetKG Aug 7, 2025
752ea78
Merge pull request #404 from helxplatform/openapi-docs
hina-shah Aug 11, 2025
6371653
ENH: Updating the schema for keyword data types
Aug 11, 2025
b511400
ENH: Changing request/response types for Sections/CDEs
Aug 12, 2025
8956abe
Merge pull request #405 from helxplatform/add-section-response
hina-shah Aug 12, 2025
5cef684
Studies API reuse variable ES query
vladimir2217 Aug 14, 2025
d4f7bf6
updated studies API search
vladimir2217 Aug 14, 2025
3ad2467
fix for cde endpoint
YaphetKG Aug 14, 2025
d8ccae1
Merge pull request #407 from helxplatform/patch-cde-api
vladimir2217 Aug 14, 2025
04ef64d
Merge pull request #406 from helxplatform/data-model-update-es-query
YaphetKG Aug 14, 2025
62e9359
Adding HEAL DDM2 parser, and its test
Aug 26, 2025
50a97ef
Merge branch 'data-model-update' of github.com:helxplatform/dug into …
Aug 26, 2025
f0d3ee1
Adding HEAL DDM2 parser, and its test
Aug 26, 2025
e7c7090
First stab at a JSON Schema export for the Dug Data Model.
gaurav Aug 28, 2025
8754448
Fixed overall JSON schema.
gaurav Aug 28, 2025
a402da2
Documented that you need a PYTHONPATH=src to run this.
gaurav Aug 28, 2025
23bde0b
response request model revamps
YaphetKG Sep 2, 2025
97aaf78
Updating Dug Data Model for efficient import
Sep 2, 2025
330b2f8
Merge pull request #409 from helxplatform/data-model-update-9-2
hina-shah Sep 2, 2025
d6efdac
pushing search edits
YaphetKG Sep 2, 2025
fcf7a1d
add identfiers
YaphetKG Sep 2, 2025
b8f200d
Merge pull request #410 from helxplatform/data-model-update-search-re…
hina-shah Sep 3, 2025
6b6073a
Merge pull request #408 from helxplatform/dug-data-model-json-schema
hina-shah Sep 3, 2025
5ccaa75
ENH: Change input element type to program name
Sep 4, 2025
e557769
Merge pull request #411 from helxplatform/add-program-name
YaphetKG Sep 4, 2025
d631dc6
revert identifier changes
Sep 4, 2025
ae7e840
Merging with remote
Sep 4, 2025
cefa880
add minimum should to make sure search returns relevant results
YaphetKG Sep 9, 2025
8ca118f
fixing api endpoints and some idnexing bug
YaphetKG Sep 9, 2025
53b172a
Ignoring elements that don't have an id
Sep 11, 2025
6eaafc1
Adding parents to concepts
Sep 15, 2025
3e7242d
Removing is_standardized to match search API
Sep 15, 2025
13ccba6
Merge branch 'data-model-update' into fix-search-filter
hina-shah Sep 16, 2025
4617328
add conditions for query being present or not. make sure that empty s…
YaphetKG Sep 16, 2025
0e34ea0
Documentation and cleanup
Sep 17, 2025
c716ed7
Merge pull request #412 from helxplatform/fix-search-filter
hina-shah Sep 17, 2025
9468c36
remove filter
YaphetKG Sep 24, 2025
012c839
Merge pull request #414 from helxplatform/remove-concept-filter
hina-shah Sep 24, 2025
61df653
Update Dockerfile and Makefile
Oct 27, 2025
123cf45
Put back requirements for descriptions for concepts
Oct 27, 2025
f88f17a
Updating programs to enable endpoints
Oct 27, 2025
cd5ae78
Update pydantic, and fix documentation page
Nov 6, 2025
5a60861
Adding concept URLs to the new data model
Nov 20, 2025
6c7aded
CLEANUP andn FIX description population
Dec 8, 2025
de2aab9
CLEANUP, TESTS and BUG: Correcting index name retrieval
Dec 11, 2025
dbfd8f7
Merge branch 'data-model-update' of github.com:helxplatform/dug into …
Dec 11, 2025
e4a7b24
FIX: return a string purl and not none
Dec 11, 2025
8ab011a
FIX index passing
Dec 11, 2025
60d0709
Correcting tests
Dec 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,20 @@
# A container for the core semantic-search capability.
#
######################################################
FROM python:3.12-alpine3.21
FROM python:3.13-alpine3.22


# Install required packages
RUN apk update && \
apk add g++ make cargo rust

RUN apk upgrade -Ua
RUN apk add "libxml2=2.13.4-r6"

RUN pip install --upgrade pip
# Create a non-root user.
ENV USER dug
ENV HOME /home/$USER
ENV UID 1000
ENV USER=dug
ENV HOME=/home/$USER
ENV UID=1000

RUN adduser -D --home $HOME --uid $UID $USER

Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
PYTHON = $(shell which python3)
VERSION_FILE = ./src/dug/_version.py
VERSION = $(shell cut -d " " -f 3 ${VERSION_FILE})
DOCKER_REPO = docker.io
DOCKER_OWNER = rti
DOCKER_REPO = containers.renci.org
DOCKER_OWNER = helxplatform
DOCKER_APP = dug
DOCKER_TAG = ${VERSION}
DOCKER_TAG = data-model-2.0-9-9
DOCKER_IMAGE = ${DOCKER_OWNER}/${DOCKER_APP}:$(DOCKER_TAG)
export PYTHONPATH = $(shell echo ${PWD})/src

Expand Down Expand Up @@ -48,7 +48,7 @@ coverage:
#build: Build Docker image
build:
echo "Building docker image: ${DOCKER_IMAGE}"
docker build -t ${DOCKER_IMAGE} -f Dockerfile .
docker build --platform=linux/amd64 -t ${DOCKER_IMAGE} -f Dockerfile .
echo "Successfully built: ${DOCKER_IMAGE}"

#publish: Build and push docker image
Expand Down
48 changes: 48 additions & 0 deletions bin/export_ddm_as_json_schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env python
#
# export_ddm_as_json_schema.py - Export Dug Data Model as JSON Schema
#
# SYNOPSIS
# PYTHONPATH=src python bin/export_ddm_as_json_schema.py
#

import click
import json
import logging

from dug.core.parsers._base import DugStudy, DugSection, DugVariable

logging.basicConfig(level=logging.INFO)

@click.command()
def export_ddm_as_json_schema():
"""

:return:
"""
logging.info("Exporting Dug Data Model as JSON Schema")

json_schema = {
'$schema': 'https://json-schema.org/draft/2020-12/schema',
# This is what Pydantic supports: https://docs.pydantic.dev/latest/api/json_schema/#pydantic.json_schema.GenerateJsonSchema
'definitions': {
'DugSection': DugSection.model_json_schema(),
'DugVariable': DugVariable.model_json_schema(),
'DugStudy': DugStudy.model_json_schema()
},
# We want to validate a list of heterogenous objects: each item in the list may be any of the Dug objects above.
'type': 'array',
'items': {
'oneOf': [
{'$ref': '#/definitions/DugSection'},
{'$ref': '#/definitions/DugVariable'},
{'$ref': '#/definitions/DugStudy'}
]
}
}

print(json.dumps(json_schema, indent=2))


if __name__ == '__main__':
export_ddm_as_json_schema()
3 changes: 2 additions & 1 deletion pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ markers =
api: mark a test as an api test
cli: mark a test as a cli test
testpaths =
tests
tests
pythonpath = src
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ MarkupSafe
ormar
mistune
pluggy
pydantic==2.9.2
pydantic==2.12.3
pyrsistent
pytest
pytest-asyncio
Expand Down
Empty file added src/dug/api_models/__init__.py
Empty file.
50 changes: 50 additions & 0 deletions src/dug/api_models/request_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from pydantic import BaseModel, field_validator
from typing import List, Optional, Any

class GetFromIndex(BaseModel):
size: int = 0

class SearchConceptQuery(BaseModel):
query: str
offset: int = 0
size: int = 20
concept_types: list = None

class SearchVariablesQuery(BaseModel):
query: str
concept: str = ""
offset: int = 0
size: int = 1000

class FilterGrouped(BaseModel):
key: str
value: List[Any]
class SearchVariablesQueryFiltered(SearchVariablesQuery):
filter: List[FilterGrouped] = []

class SearchKgQuery(BaseModel):
query: str
unique_id: str
index: str = "kg_index"
size:int = 100

class SearchElementQuery(BaseModel):
query: str = None
parent_ids: Optional[List] = None
element_ids: Optional[List] = None
concept: Optional[str] = None
size: Optional[int] = 100
offset: Optional[int] = 0

@field_validator("parent_ids", "element_ids", mode="before")
@classmethod
def drop_empty_strings(cls, v):
if v is None:
return v
return [item for item in v if item not in ("", None)]

class VariableIds(BaseModel):
"""
List of variable IDs
"""
ids: Optional[List[str]] = []
69 changes: 69 additions & 0 deletions src/dug/api_models/response_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from dug.core.parsers._base import *
from pydantic import BaseModel, model_serializer
from typing import Optional, Any


class ElasticResultMetaData(BaseModel):
total_count: int
offset: int
size: int


class ElasticDugElementResult(BaseModel):
# Class for all entities from elastic search, we are going to have score... optionally explanation
score: float = Field(default=999)
explanation: dict = Field(default_factory=dict)
# we are going to ignore concepts...
concepts: None = Field(default=None, exclude=True)


class DugAPIResponse(BaseModel):
results: List[ElasticDugElementResult]
metadata: Optional[ElasticResultMetaData] = Field(default_factory=dict)


class ConceptResponse(ElasticDugElementResult, DugConcept):
identifiers: List[Any]
concepts: None = Field(default=None, exclude=True)


class ConceptsAPIResponse(BaseModel):
metadata: ElasticResultMetaData
results: List[ConceptResponse]
concept_types: dict = Field(default="")


class VariableResponse(ElasticDugElementResult, DugVariable):
@model_serializer
def serialize(self):
response = self.get_response_dict()
return response


class VariablesAPIResponse(DugAPIResponse):
results: List[VariableResponse]


class StudyResponse(ElasticDugElementResult, DugStudy):
@model_serializer
def serialize(self):
response = self.get_response_dict()
response.pop('abstract')
return response


class StudyAPIResponse(DugAPIResponse):
results: List[StudyResponse]


class SectionResponse(ElasticDugElementResult, DugSection):
@model_serializer
def serialize(self):
response = self.get_response_dict()
return response


class SectionAPIResponse(DugAPIResponse):
results: List[SectionResponse]


12 changes: 5 additions & 7 deletions src/dug/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,9 @@ def get_argparser():
)

crawl_parser.add_argument(
'-e', '--element-type',
help='[Optional] Coerce all elements to a certain data type (e.g. DbGaP Variable).\n'
'Determines what tab elements will appear under in Dug front-end',
dest="element_type",
'-e', '--program_name',
help='[Optional] Coerce all elements to a certain program (e.g. HEAL/RADX/DBGap/etc).\n',
dest="program_name",
default=None
)

Expand Down Expand Up @@ -115,7 +114,7 @@ def crawl(args):
config.node_to_element_queries = {}
factory = DugFactory(config)
dug = Dug(factory)
dug.crawl(args.target, args.parser_type, args.annotator_type, args.element_type)
dug.crawl(args.target, args.parser_type, args.annotator_type, args.program_name)


def search(args):
Expand All @@ -126,7 +125,6 @@ def search(args):
response = dug.search(args.target, args.query, **args.kwargs)
# Using json.dumps raises 'TypeError: Object of type ObjectApiResponse is not JSON serializable'
#jsonResponse = json.dumps(response, indent = 2)
print(response)

def datatypes(args):
config = Config.from_env()
Expand All @@ -137,7 +135,7 @@ def datatypes(args):


def status(args):
print("Status check is not implemented yet!")
logger.warning("Status check is not implemented yet!")


def main(args=None):
Expand Down
26 changes: 17 additions & 9 deletions src/dug/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
@dataclass
class Config:
"""
TODO: Populate description
TODO: Make all URLs available as enviroment variables.
"""

elastic_password: str = "changeme"
Expand All @@ -21,6 +21,7 @@ class Config:
elastic_scheme: str = "https"
elastic_ca_path: str = ""
elastic_ca_verify: bool = True
max_ids_limit = 10000

redis_host: str = "redis"
redis_port: int = 6379
Expand All @@ -30,7 +31,11 @@ class Config:

studies_path: str=""


kg_index_name: str="kg_index"
concepts_index_name: str="concepts_index"
variables_index_name: str='variables_index'
studies_index_name: str='studies_index'
sections_index_name: str='sections_index'

# Preprocessor config that will be passed to annotate.Preprocessor constructor
preprocessor: dict = field(
Expand All @@ -47,8 +52,8 @@ class Config:
"url": "https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content="
},
"sapbert": {
"classification_url": "https://med-nemo.apps.renci.org/annotate/",
"annotator_url": "https://sap-qdrant.apps.renci.org/annotate/",
"classification_url": "http://med-nemo-serve-nemo-web-server.ner/annotate/",
"annotator_url": "http://qdrant-sapbert-nemo-web-server.ner/annotate/",
"score_threshold": 0.8,
"bagel": {
"enabled": False,
Expand All @@ -71,14 +76,14 @@ class Config:
# Normalizer config that will be passed to annotate.Normalizer constructor
normalizer: dict = field(
default_factory=lambda: {
"url": "https://nodenormalization-dev.apps.renci.org/get_normalized_nodes?conflate=false&description=true&curie="
"url": "http://nn-web-node-normalization-web-service-root.translator-dev:8080/get_normalized_nodes?conflate=false&description=true&curie="
}
)

# Synonym service config that will be passed to annotate.SynonymHelper constructor
synonym_service: dict = field(
default_factory=lambda: {
"url": "https://name-resolution-sri.renci.org/reverse_lookup"
"url": "http://name-resolution-name-lookup-web-svc.translator-dev:2433/synonyms"
}
)

Expand Down Expand Up @@ -127,7 +132,7 @@ class Config:

concept_expander: dict = field(
default_factory=lambda: {
"url": "https://tranql-dev.renci.org/tranql/query?dynamic_id_resolution=true&asynchronous=false",
"url": "http://search-tranql:8081/tranql/tranql/query?dynamic_id_resolution=true&asynchronous=false",
"min_tranql_score": 0.0,
}
)
Expand Down Expand Up @@ -159,10 +164,13 @@ def from_env(cls):
"redis_port": "REDIS_PORT",
"redis_password": "REDIS_PASSWORD",
"studies_path": "STUDIES_PATH",
"kg_index_name": "ELASTIC_KG_INDEX_NAME",
"concepts_index_name": "ELASTIC_CONCEPTS_INDEX_NAME",
"variables_index_name": "ELASTIC_VARIABLES_INDEX_NAME",
"studies_index_name": "ELASTIC_STUDIES_INDEX_NAME",
"sections_index_name": "ELASTIC_SECTIONS_INDEX_NAME",
}

kwargs = {}

for kwarg, env_var in env_vars.items():
env_value = os.environ.get(env_var)
if env_value:
Expand Down
Loading