Skip to content

Feat/test azure filestore metadata#1234

Open
dmaresma wants to merge 12 commits into
datacontract:mainfrom
dmaresma:feat/test_filestore_metadata
Open

Feat/test azure filestore metadata#1234
dmaresma wants to merge 12 commits into
datacontract:mainfrom
dmaresma:feat/test_filestore_metadata

Conversation

@dmaresma
Copy link
Copy Markdown
Contributor

related to #1227

  • [ x] Tests pass (uv run pytest)
  • [x ] Code formatted (uv run ruff check --fix && uv run ruff format)
  • README.md updated (if relevant)
  • [ x] CHANGELOG.md entry added

Copy link
Copy Markdown
Collaborator

@jschoedl jschoedl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea to use Data Contracts for this!

Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Comment thread datacontract/engines/datacontract/check_azure_blob_file.py Outdated
Copy link
Copy Markdown
Collaborator

@jschoedl jschoedl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong button 😅

Comment thread README.md
Comment on lines +606 to 607
location: azure://container@datameshdatabricksdemo.dfs.core.windows.net/entity={model}/year=*/month=*/day=*/*.parquet
format: parquet
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this combination of azure and dfs is not supported

Comment thread README.md
Comment on lines +615 to +616
location: azure://web@datameshdatabricksdemo.blob.core.windows.net/media={model}/year=*/month=*/day=*/*.jpg
format: binary
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{model} substitution is not implemented, either implement or remove from the example


# Azure Blob / ADLS Gen2 file-metadata checks (physicalType=file schemas and server format is binary)
if server.type in ("azure") and server.format == "binary" and _has_file_schemas(data_contract, schema_name):
check_azure_blob_file(run, data_contract, server)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that this PR should support --check-category to filter for quality or schema, this parameter is supported for test but will be silently ignored

Comment on lines +170 to +174
# ── Check: location not empty ─────────────────────────────────────────────
_check_location_not_empty(run, schema_name, blobs, prefix)

if not blobs:
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some scenarios, one might want to allow 0 blobs. We should only enforce a non-empty location if the contract specifies it. (Similarly, the existing soda checks for other testing backends will work fine if there are 0 rows in a database)

Comment on lines +58 to +59
# Properties whose values are datetimes — auto-checked "not in future" when declared
_DATETIME_PROPS = {"lastModified", "creationTime", "lastAccessedOn"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are unused, do you still want to implement this feature?

Comment thread README.md
Comment on lines +668 to +671
Datetime sentinel:
Quality constraints on datetime properties accept the special string ``"now"`` as a
comparand (``mustBeLessThan``, ``mustBeGreaterThan``, etc.) — it is resolved to the
current UTC datetime at evaluation time.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not yet implemented.

Comment on lines +79 to +83
# Azure Blob / ADLS Gen2 file-metadata checks (physicalType=file schemas and server format is binary)
if server.type in ("azure") and server.format == "binary" and _has_file_schemas(data_contract, schema_name):
check_azure_blob_file(run, data_contract, server)
else:
check_soda_execute(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need _has_file_schemas? A server with type azure and format binary won't produce meaningful soda checks anyways, so I guess its fine to call check_azure_blob_file even if we have 0 schemas

Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
dmaresma and others added 2 commits May 30, 2026 20:39
Co-authored-by: Jakob Schödl <jakob.schoedl@mailbox.org>
Co-authored-by: Jakob Schödl <jakob.schoedl@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ODCS test image or binaries on the top of a local / cloud blob storage, adlsgen2, aws S3 and GCP storage

2 participants