Skip to content

[FEATURE]: Refactor and simplify samples #90

@rederik76

Description

@rederik76

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

The current samples directory has grown sporadically over time and is now split across four core loosely coupled bundles (bronze_sample, silver_sample, gold_sample, test_data_and_orchestrator) that are difficult to navigate, deploy, and maintain. Each bundle has its own schema namespace, its own orchestration setup, and cross-bundle data dependencies that make it hard to run a subset of samples independently. There is no clear separation between "feature demonstration" samples (show one framework capability in isolation) and "pattern" samples (end-to-end medallion pipelines). New contributors and users evaluating the framework are left with an unclear entry point.

The tpch and yaml bundle will also be dealt with. TBC

Proposed Solution

Replace the four existing core sample bundles with two well-scoped bundles:

  1. feature-samples - A single-schema bundle ({namespace}feature) that demonstrates every Lakeflow Framework feature in isolation: CDC, historical snapshots, data quality, table migration, DPM, templates, Python sources/transforms, libraries, and Kafka. Uses a tiered job to run all feature pipelines from a shared staging load. Tables follow src / tgt_ naming conventions.
  2. pattern-samples - A multi-schema bundle ({namespace}_bronze/silver/gold) showing complete end-to-end medallion patterns: multi-source streaming, stream-static joins, CDC from snapshot sources, and gold-layer materialized views. Runs as a 4-day incremental load cycle.

The test_data_and_orchestrator bundle is removed entirely. Its responsibilities are absorbed into the two new bundles: each now owns its own schema initialisation notebook, staging data load notebook, and job orchestration — eliminating the shared runtime dependency that previously coupled all sample bundles together at deploy time.

Both bundles:

  • Include their own schema init, staging load, and orchestration notebooks (no shared test_data_and_orchestrator dependency)
  • Reference DLT pipelines via ${resources.pipelines..id} (proper DABS resource references, no runtime name lookups)
  • Are deployed via dedicated deploy_feature_samples.sh and deploy_pattern_samples.sh scripts

Additional Context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions