Production-grade dbt project for the Advanced dbt training module. Built with 10,000 real orders from Olist Brazilian E-commerce.
After completing this module, you will be able to:
- β Build robust tests with dbt_expectations and custom generic tests
- β Implement data contracts (dbt 1.5+) and unit tests (dbt 1.8+)
- β Create reusable macros and leverage dbt packages
- β Set up CI/CD pipelines with GitHub Actions
- β Implement Slim CI for faster PR validation
- β Monitor pipeline performance and troubleshoot issues
- β Use dbt artifacts for custom tooling
- Python 3.8+
- Google Cloud Platform account
- dbt 1.7+ (
pip install dbt-bigquery) - Basic dbt knowledge (models, ref, source)
# 1. Clone the repository
git clone <repository-url>
cd dbt_advanced_olist
# 2. Run setup script (creates venv, installs dbt)
chmod +x scripts/setup.sh
./scripts/setup.sh
# 3. Configure profiles.yml
cp profiles.yml.example ~/.dbt/profiles.yml
# Edit ~/.dbt/profiles.yml with your GCP project ID and keyfile path
# 4. Load data to BigQuery (REQUIRED!)
export GCP_PROJECT_ID="your-project-id"
chmod +x scripts/load_data_to_bigquery.sh
./scripts/load_data_to_bigquery.sh
# 5. Test connection
source venv/bin/activate
dbt debug
# 6. Build the project
dbt buildIf the script doesn't work, load CSVs manually via BigQuery Console:
- Go to BigQuery Console β Create Dataset named
raw_data - For each CSV in
data/folder: Create Table β Upload β Select CSV - Use schema auto-detection or refer to
scripts/load_data_to_bigquery.shfor column types
dbt_advanced_olist/
βββ models/
β βββ staging/ # Source cleaning (views)
β β βββ _sources.yml # Source definitions + freshness
β β βββ _staging.yml # Schema tests
β β βββ stg_*.sql # Staging models
β βββ intermediate/ # Business logic joins
β β βββ int_*.sql
β βββ marts/ # BI-ready tables
β βββ _marts.yml # Data contracts + unit tests
β βββ dim_customers.sql
β βββ fct_orders.sql
βββ tests/
β βββ generic/ # Reusable test macros
β β βββ test_is_positive.sql
β β βββ test_is_not_negative.sql
β β βββ test_is_recent.sql
β β βββ test_row_count_delta.sql
β βββ singular/ # Custom SQL tests
β βββ assert_orders_have_items.sql
β βββ assert_delivery_dates_logical.sql
βββ macros/
β βββ limit_in_dev.sql
β βββ generate_date_parts.sql
β βββ safe_divide.sql
β βββ logging_hooks.sql
βββ labs/ # π Hands-on exercises (6 labs)
β βββ lab-01/ # Custom Generic Tests
β βββ lab-02/ # Data Contracts & Unit Tests
β βββ lab-03/ # Creating DRY Macros
β βββ lab-04/ # Packages & Audit Helper
β βββ lab-05/ # Setting Up Slim CI
β βββ lab-06/ # Analyzing Pipeline Performance
βββ seeds/ # Reference data
βββ snapshots/ # SCD Type 2
βββ analyses/ # Ad-hoc queries
βββ .github/workflows/ # CI/CD pipelines
βββ ci.yml # PR validation
βββ deploy.yml # Production deployment
This module includes 6 hands-on labs (2 per session) with complete step-by-step instructions.
| Lab | Topic | Duration | Description |
|---|---|---|---|
| Lab 01 | Custom Generic Tests | 25 min | Create reusable is_recent test macro |
| Lab 02 | Data Contracts & Unit Tests | 30 min | Enforce schema + test business logic |
| Lab | Topic | Duration | Description |
|---|---|---|---|
| Lab 03 | Creating DRY Macros | 30 min | Build generate_date_columns macro |
| Lab 04 | Packages & Audit Helper | 30 min | Use dbt_utils and compare model versions |
| Lab | Topic | Duration | Description |
|---|---|---|---|
| Lab 05 | Setting Up Slim CI | 30 min | Configure GitHub Actions + state selection |
| Lab 06 | Analyzing Pipeline Performance | 25 min | Parse artifacts + identify bottlenecks |
Each lab folder contains:
lab-XX/
βββ README.md # π Full instructions, background, step-by-step guide
βββ starter/ # π‘ Template files to complete
β βββ *.sql/py/yml
βββ solution/ # β
Complete working solutions
βββ *.sql/py/yml
columns:
- name: order_total
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 50000{% test is_positive(model, column_name) %}
select * from {{ model }}
where {{ column_name }} <= 0
{% endtest %}models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: string
constraints:
- type: not_null
- type: primary_keyunit_tests:
- name: test_vip_segment
model: dim_customers
given:
- input: ref('stg_orders')
rows:
- {customer_id: 'c1', order_total: 600}
expect:
rows:
- {customer_id: 'c1', customer_segment: 'VIP'}# Build everything
dbt build
# Run only modified + downstream (Slim CI)
dbt build --select state:modified+ --state prod-manifest/
# Test specific model
dbt test --select dim_customers
# Run unit tests only
dbt test --select test_type:unit
# Run generic tests only
dbt test --select test_type:generic
# Generate documentation
dbt docs generate && dbt docs serve
# Check source freshness
dbt source freshness
# Compile to see generated SQL
dbt compile --select model_name| Package | Version | Purpose |
|---|---|---|
| dbt_utils | 1.1.1 | Utilities (surrogate_key, date_spine) |
| dbt_expectations | 0.10.3 | Great Expectations-style tests |
| dbt_audit_helper | 0.9.0 | Compare model outputs |
| codegen | 0.12.1 | Generate schema YAML |
Install packages:
dbt depsβββββββββββββββ βββββββββββββββ βββββββββββββββ
β Branch ββββββΆβ Pull RequestββββββΆβ Main β
β (develop) β β (CI test) β β (deploy) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
βΌ βΌ βΌ
dbt build Slim CI: Full build
(local) state:modified+ + artifacts
Use this checklist to track your progress:
- Lab 01: Custom Generic Tests -
is_recenttest created and running - Lab 02: Data Contracts - Contract enforced on
dim_customers, unit tests passing - Lab 03: DRY Macros -
generate_date_columnsmacro compiles correctly - Lab 04: Audit Helper - Model comparison executed successfully
- Lab 05: Slim CI - GitHub Actions workflow configured and tested
- Lab 06: Performance Analysis - Bottlenecks identified and recommendations generated
By completing all labs, you will demonstrate the ability to:
| Outcome | Verified In |
|---|---|
| Build a pipeline of models | All labs |
| Apply testing to the models | Labs 01, 02 |
| Pick out problems when alerts fire | Lab 06 |
| Review a pipeline and locate bottlenecks | Lab 06 |
| Create reusable, maintainable code | Labs 03, 04 |
| Set up production-grade CI/CD | Lab 05 |
- dbt Documentation
- dbt_expectations
- dbt_audit_helper
- Olist Dataset (Kaggle)
- dbt Best Practices
- dbt Certification
Dataset: CC BY-NC-SA 4.0 (Olist via Kaggle)
Built for Coding Black Females - Advanced dbt Module π