[DCP Ingestion] by gmechali · Pull Request #503 · datacommonsorg/import

gmechali · 2026-04-30T16:32:58Z

No description provided.

codacy-production · 2026-04-30T16:35:03Z

Not up to standards ⛔

🔴 Issues 6 high · 13 medium · 17 minor

Alerts:
⚠ 36 issues (≤ 0 issues of at least minor severity)

Results:
36 new issues

Category Results

UnusedCode 6 medium

ErrorProne 6 high

Security 3 medium

CodeStyle 17 minor

Complexity 4 medium

View in Codacy

🟢 Metrics 42 complexity · 0 duplication

Metric Results

Complexity 42

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request introduces a new 'DCP Bridge' mode to the ingestion tool, which facilitates bridging the gap between custom data resolution and the production platform by exporting data to sharded JSON-LD files. The changes include a new exporter module, documentation, and logic to automatically trigger Google Cloud Workflows for data ingestion. The review feedback highlights several important improvements: replacing hardcoded GCP resource fallbacks with environment-driven configuration, adopting parameterized SQL queries to mitigate injection risks in the exporter, and refining exception handling when triggering external workflows to avoid silent failures.

gemini-code-assist · 2026-04-30T16:38:39Z

+    spanner_instance = os.getenv("GCP_SPANNER_INSTANCE_ID", "gabe-test-dcp-instance")
+    spanner_database = os.getenv("GCP_SPANNER_DATABASE_NAME", "gabe-test-dcp-db-v2")
+    workflow_name = os.getenv("WORKFLOW_NAME", "gabe-test-ingestion-orchestrator")
+    project_id = os.getenv("PROJECT_ID", "datcom-website-dev")
+    location = os.getenv("WORKFLOW_LOCATION", "us-central1")
+    temp_location = os.getenv("TEMP_LOCATION", "gs://gabe-test-ingestion-bucket-datcom-website-dev/temp")
+    region = os.getenv("REGION", "us-central1")


Hardcoded fallback values for GCP resources (Spanner instances, databases, workflows, etc.) containing personal or test-specific names should be avoided in the codebase. These should be strictly managed via environment variables or configuration files to ensure portability and security.

gemini-code-assist · 2026-04-30T16:38:39Z

+    offset = 0
+    while True:
+        triples_tuples = db.engine.fetch_all(
+            f"SELECT subject_id, predicate, object_id, object_value FROM triples LIMIT {chunk_size} OFFSET {offset}"


Using f-strings to construct SQL queries is generally discouraged as it can lead to SQL injection vulnerabilities if the variables are ever sourced from user input. Although chunk_size and offset are currently integers, it is better practice to use parameterized queries.

Suggested change

f"SELECT subject_id, predicate, object_id, object_value FROM triples LIMIT {chunk_size} OFFSET {offset}"

"SELECT subject_id, predicate, object_id, object_value FROM triples LIMIT ? OFFSET ?", (chunk_size, offset)

gemini-code-assist · 2026-04-30T16:38:39Z

+    obs_counter = 0
+    while True:
+        obs_tuples = db.engine.fetch_all(
+            f"SELECT entity, variable, date, value, provenance, unit, scaling_factor, measurement_method, observation_period, properties FROM observations LIMIT {chunk_size} OFFSET {offset}"


Using f-strings to construct SQL queries is generally discouraged. Please use parameterized queries to ensure safety and follow best practices.

Suggested change

f"SELECT entity, variable, date, value, provenance, unit, scaling_factor, measurement_method, observation_period, properties FROM observations LIMIT {chunk_size} OFFSET {offset}"

"SELECT entity, variable, date, value, provenance, unit, scaling_factor, measurement_method, observation_period, properties FROM observations LIMIT ? OFFSET ?", (chunk_size, offset)

gemini-code-assist · 2026-04-30T16:38:39Z

+    except Exception as e:
+        logging.error(f"Error triggering workflow via API: {e}")


Catching a broad Exception and only logging it can hide unexpected errors and make debugging difficult. Consider catching more specific exceptions related to the API call (e.g., requests.exceptions.RequestException) or re-raising the exception after logging to ensure the failure is not silently ignored by the caller.

gmechali added 3 commits April 29, 2026 16:43

Adds support for the DCP Bridge run mode, to output the JSONLD files.

deadc2f

Connect to gcloud workflow

3d1b36e

Successfully linked the CDC data Job to cloud run workflow

b7bd839

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

gmechali added 3 commits April 30, 2026 14:48

Clean up datacommonsorg#1 for the diffs to CDC data job

a87977d

Removing debug logs and defaults

3ae0ff8

Cleanup main, runner and trigger

147e6a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DCP Ingestion]#503

[DCP Ingestion]#503
gmechali wants to merge 6 commits intodatacommonsorg:masterfrom
gmechali:jsonld2

gmechali commented Apr 30, 2026

Uh oh!

codacy-production Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	f"SELECT subject_id, predicate, object_id, object_value FROM triples LIMIT {chunk_size} OFFSET {offset}"
	"SELECT subject_id, predicate, object_id, object_value FROM triples LIMIT ? OFFSET ?", (chunk_size, offset)

	f"SELECT entity, variable, date, value, provenance, unit, scaling_factor, measurement_method, observation_period, properties FROM observations LIMIT {chunk_size} OFFSET {offset}"
	"SELECT entity, variable, date, value, provenance, unit, scaling_factor, measurement_method, observation_period, properties FROM observations LIMIT ? OFFSET ?", (chunk_size, offset)

		except Exception as e:
		logging.error(f"Error triggering workflow via API: {e}")

Conversation

gmechali commented Apr 30, 2026

Uh oh!

codacy-production Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production Bot commented Apr 30, 2026 •

edited

Loading