Feat/test azure filestore metadata#1234
Conversation
jschoedl
left a comment
There was a problem hiding this comment.
Interesting idea to use Data Contracts for this!
| location: azure://container@datameshdatabricksdemo.dfs.core.windows.net/entity={model}/year=*/month=*/day=*/*.parquet | ||
| format: parquet |
There was a problem hiding this comment.
this combination of azure and dfs is not supported
| location: azure://web@datameshdatabricksdemo.blob.core.windows.net/media={model}/year=*/month=*/day=*/*.jpg | ||
| format: binary |
There was a problem hiding this comment.
{model} substitution is not implemented, either implement or remove from the example
|
|
||
| # Azure Blob / ADLS Gen2 file-metadata checks (physicalType=file schemas and server format is binary) | ||
| if server.type in ("azure") and server.format == "binary" and _has_file_schemas(data_contract, schema_name): | ||
| check_azure_blob_file(run, data_contract, server) |
There was a problem hiding this comment.
Just noticed that this PR should support --check-category to filter for quality or schema, this parameter is supported for test but will be silently ignored
| # ── Check: location not empty ───────────────────────────────────────────── | ||
| _check_location_not_empty(run, schema_name, blobs, prefix) | ||
|
|
||
| if not blobs: | ||
| return |
There was a problem hiding this comment.
In some scenarios, one might want to allow 0 blobs. We should only enforce a non-empty location if the contract specifies it. (Similarly, the existing soda checks for other testing backends will work fine if there are 0 rows in a database)
| # Properties whose values are datetimes — auto-checked "not in future" when declared | ||
| _DATETIME_PROPS = {"lastModified", "creationTime", "lastAccessedOn"} |
There was a problem hiding this comment.
Those are unused, do you still want to implement this feature?
| Datetime sentinel: | ||
| Quality constraints on datetime properties accept the special string ``"now"`` as a | ||
| comparand (``mustBeLessThan``, ``mustBeGreaterThan``, etc.) — it is resolved to the | ||
| current UTC datetime at evaluation time. |
There was a problem hiding this comment.
This is not yet implemented.
| # Azure Blob / ADLS Gen2 file-metadata checks (physicalType=file schemas and server format is binary) | ||
| if server.type in ("azure") and server.format == "binary" and _has_file_schemas(data_contract, schema_name): | ||
| check_azure_blob_file(run, data_contract, server) | ||
| else: | ||
| check_soda_execute( |
There was a problem hiding this comment.
Do we need _has_file_schemas? A server with type azure and format binary won't produce meaningful soda checks anyways, so I guess its fine to call check_azure_blob_file even if we have 0 schemas
Co-authored-by: Jakob Schödl <jakob.schoedl@mailbox.org>
Co-authored-by: Jakob Schödl <jakob.schoedl@mailbox.org>
related to #1227
uv run pytest)uv run ruff check --fix && uv run ruff format)