Is there an existing issue for this?
Problem statement
The current samples directory has grown sporadically over time and is now split across four core loosely coupled bundles (bronze_sample, silver_sample, gold_sample, test_data_and_orchestrator) that are difficult to navigate, deploy, and maintain. Each bundle has its own schema namespace, its own orchestration setup, and cross-bundle data dependencies that make it hard to run a subset of samples independently. There is no clear separation between "feature demonstration" samples (show one framework capability in isolation) and "pattern" samples (end-to-end medallion pipelines). New contributors and users evaluating the framework are left with an unclear entry point.
The tpch and yaml bundle will also be dealt with. TBC
Proposed Solution
Replace the four existing core sample bundles with two well-scoped bundles:
- feature-samples - A single-schema bundle ({namespace}feature) that demonstrates every Lakeflow Framework feature in isolation: CDC, historical snapshots, data quality, table migration, DPM, templates, Python sources/transforms, libraries, and Kafka. Uses a tiered job to run all feature pipelines from a shared staging load. Tables follow src / tgt_ naming conventions.
- pattern-samples - A multi-schema bundle ({namespace}_bronze/silver/gold) showing complete end-to-end medallion patterns: multi-source streaming, stream-static joins, CDC from snapshot sources, and gold-layer materialized views. Runs as a 4-day incremental load cycle.
The test_data_and_orchestrator bundle is removed entirely. Its responsibilities are absorbed into the two new bundles: each now owns its own schema initialisation notebook, staging data load notebook, and job orchestration — eliminating the shared runtime dependency that previously coupled all sample bundles together at deploy time.
Both bundles:
- Include their own schema init, staging load, and orchestration notebooks (no shared test_data_and_orchestrator dependency)
- Reference DLT pipelines via ${resources.pipelines..id} (proper DABS resource references, no runtime name lookups)
- Are deployed via dedicated deploy_feature_samples.sh and deploy_pattern_samples.sh scripts
Additional Context
No response
Is there an existing issue for this?
Problem statement
The current samples directory has grown sporadically over time and is now split across four core loosely coupled bundles (bronze_sample, silver_sample, gold_sample, test_data_and_orchestrator) that are difficult to navigate, deploy, and maintain. Each bundle has its own schema namespace, its own orchestration setup, and cross-bundle data dependencies that make it hard to run a subset of samples independently. There is no clear separation between "feature demonstration" samples (show one framework capability in isolation) and "pattern" samples (end-to-end medallion pipelines). New contributors and users evaluating the framework are left with an unclear entry point.
The tpch and yaml bundle will also be dealt with. TBC
Proposed Solution
Replace the four existing core sample bundles with two well-scoped bundles:
The test_data_and_orchestrator bundle is removed entirely. Its responsibilities are absorbed into the two new bundles: each now owns its own schema initialisation notebook, staging data load notebook, and job orchestration — eliminating the shared runtime dependency that previously coupled all sample bundles together at deploy time.
Both bundles:
Additional Context
No response