fix(duckdb): use explicit contract schema columns for CSV/Parquet reads (#1065) by barry0451 · Pull Request #1183 · datacontract/datacontract-cli

barry0451 · 2026-04-21T03:40:19Z

Summary

Fixes #1065 — field_is_present check always passes for CSV/Parquet files because the current implementation compares row counts instead of checking actual column overlap. Uses explicit SELECT of contract schema columns so missing data columns = NULL, allowing field_is_present to catch them.

Changes

Modified create_view_with_schema_union in datacontract/engines/soda/connections/duckdb_connection.py
Previously used INTERSECT to find columns in both contract and data (missing contract columns were silently ignored)
Now explicitly SELECTs contract schema columns by name, so missing data columns become NULL

Testing

All existing tests pass

…ds (datacontract#1065) Previously the INTERSECT query selected only columns present in BOTH contract and data, so missing contract columns were silently ignored — field_is_present always passed. Now we SELECT explicitly the contract schema columns by name, so missing data columns become NULL and field_is_present can properly catch them.

jschoedl · 2026-04-22T10:28:10Z

Closing in favour of #1163. Please avoid creating PRs if another one already exists for this issue, except when it is stale for a longer time.

jschoedl closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(duckdb): use explicit contract schema columns for CSV/Parquet reads (#1065)#1183

fix(duckdb): use explicit contract schema columns for CSV/Parquet reads (#1065)#1183
barry0451 wants to merge 1 commit into
datacontract:mainfrom
0451-software:upstream-pr-1065

barry0451 commented Apr 21, 2026

Uh oh!

jschoedl commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

barry0451 commented Apr 21, 2026

Summary

Changes

Testing

Uh oh!

jschoedl commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants