Advanced dbt - Real Olist E-Commerce Dataset

Production-grade dbt project for the Advanced dbt training module. Built with 10,000 real orders from Olist Brazilian E-commerce.

🎯 Learning Objectives

After completing this module, you will be able to:

✅ Build robust tests with dbt_expectations and custom generic tests
✅ Implement data contracts (dbt 1.5+) and unit tests (dbt 1.8+)
✅ Create reusable macros and leverage dbt packages
✅ Set up CI/CD pipelines with GitHub Actions
✅ Implement Slim CI for faster PR validation
✅ Monitor pipeline performance and troubleshoot issues
✅ Use dbt artifacts for custom tooling

🚀 Quick Start

Prerequisites

Python 3.8+
Google Cloud Platform account
dbt 1.7+ (pip install dbt-bigquery)
Basic dbt knowledge (models, ref, source)

Setup (15 minutes)

# 1. Clone the repository
git clone <repository-url>
cd dbt_advanced_olist

# 2. Run setup script (creates venv, installs dbt)
chmod +x scripts/setup.sh
./scripts/setup.sh

# 3. Configure profiles.yml
cp profiles.yml.example ~/.dbt/profiles.yml
# Edit ~/.dbt/profiles.yml with your GCP project ID and keyfile path

# 4. Load data to BigQuery (REQUIRED!)
export GCP_PROJECT_ID="your-project-id"
chmod +x scripts/load_data_to_bigquery.sh
./scripts/load_data_to_bigquery.sh

# 5. Test connection
source venv/bin/activate
dbt debug

# 6. Build the project
dbt build

Manual Data Loading (Alternative)

If the script doesn't work, load CSVs manually via BigQuery Console:

Go to BigQuery Console → Create Dataset named raw_data
For each CSV in data/ folder: Create Table → Upload → Select CSV
Use schema auto-detection or refer to scripts/load_data_to_bigquery.sh for column types

📁 Project Structure

dbt_advanced_olist/
├── models/
│   ├── staging/           # Source cleaning (views)
│   │   ├── _sources.yml   # Source definitions + freshness
│   │   ├── _staging.yml   # Schema tests
│   │   └── stg_*.sql      # Staging models
│   ├── intermediate/      # Business logic joins
│   │   └── int_*.sql      
│   └── marts/             # BI-ready tables
│       ├── _marts.yml     # Data contracts + unit tests
│       ├── dim_customers.sql
│       └── fct_orders.sql
├── tests/
│   ├── generic/           # Reusable test macros
│   │   ├── test_is_positive.sql
│   │   ├── test_is_not_negative.sql
│   │   ├── test_is_recent.sql
│   │   └── test_row_count_delta.sql
│   └── singular/          # Custom SQL tests
│       ├── assert_orders_have_items.sql
│       └── assert_delivery_dates_logical.sql
├── macros/
│   ├── limit_in_dev.sql
│   ├── generate_date_parts.sql
│   ├── safe_divide.sql
│   └── logging_hooks.sql
├── labs/                  # 🆕 Hands-on exercises (6 labs)
│   ├── lab-01/            # Custom Generic Tests
│   ├── lab-02/            # Data Contracts & Unit Tests
│   ├── lab-03/            # Creating DRY Macros
│   ├── lab-04/            # Packages & Audit Helper
│   ├── lab-05/            # Setting Up Slim CI
│   └── lab-06/            # Analyzing Pipeline Performance
├── seeds/                 # Reference data
├── snapshots/             # SCD Type 2
├── analyses/              # Ad-hoc queries
└── .github/workflows/     # CI/CD pipelines
    ├── ci.yml             # PR validation
    └── deploy.yml         # Production deployment

🧪 Labs Overview

This module includes 6 hands-on labs (2 per session) with complete step-by-step instructions.

Session 1: Building Robust Tests

Lab	Topic	Duration	Description
Lab 01	Custom Generic Tests	25 min	Create reusable `is_recent` test macro
Lab 02	Data Contracts & Unit Tests	30 min	Enforce schema + test business logic

Session 2: Macros & Packages

Lab	Topic	Duration	Description
Lab 03	Creating DRY Macros	30 min	Build `generate_date_columns` macro
Lab 04	Packages & Audit Helper	30 min	Use dbt_utils and compare model versions

Session 3: CI/CD & Production

Lab	Topic	Duration	Description
Lab 05	Setting Up Slim CI	30 min	Configure GitHub Actions + state selection
Lab 06	Analyzing Pipeline Performance	25 min	Parse artifacts + identify bottlenecks

Lab Structure

Each lab folder contains:

lab-XX/
├── README.md        # 📖 Full instructions, background, step-by-step guide
├── starter/         # 💡 Template files to complete
│   └── *.sql/py/yml
└── solution/        # ✅ Complete working solutions
    └── *.sql/py/yml

🧪 Testing Features

Schema Tests (dbt_expectations)

columns:
  - name: order_total
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          min_value: 0
          max_value: 50000

Custom Generic Tests

{% test is_positive(model, column_name) %}
select * from {{ model }}
where {{ column_name }} <= 0
{% endtest %}

Data Contracts (dbt 1.5+)

models:
  - name: dim_customers
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: string
        constraints:
          - type: not_null
          - type: primary_key

Unit Tests (dbt 1.8+)

unit_tests:
  - name: test_vip_segment
    model: dim_customers
    given:
      - input: ref('stg_orders')
        rows:
          - {customer_id: 'c1', order_total: 600}
    expect:
      rows:
        - {customer_id: 'c1', customer_segment: 'VIP'}

🔧 Key Commands

# Build everything
dbt build

# Run only modified + downstream (Slim CI)
dbt build --select state:modified+ --state prod-manifest/

# Test specific model
dbt test --select dim_customers

# Run unit tests only
dbt test --select test_type:unit

# Run generic tests only
dbt test --select test_type:generic

# Generate documentation
dbt docs generate && dbt docs serve

# Check source freshness
dbt source freshness

# Compile to see generated SQL
dbt compile --select model_name

📦 Packages Used

Package	Version	Purpose
dbt_utils	1.1.1	Utilities (surrogate_key, date_spine)
dbt_expectations	0.10.3	Great Expectations-style tests
dbt_audit_helper	0.9.0	Compare model outputs
codegen	0.12.1	Generate schema YAML

Install packages:

dbt deps

🌐 CI/CD Pipeline

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Branch    │────▶│  Pull Request│────▶│    Main     │
│  (develop)  │     │   (CI test)  │     │  (deploy)   │
└─────────────┘     └─────────────┘     └─────────────┘
      │                    │                    │
      ▼                    ▼                    ▼
   dbt build          Slim CI:              Full build
   (local)         state:modified+         + artifacts

📊 Lab Completion Tracking

Use this checklist to track your progress:

Lab 01: Custom Generic Tests - is_recent test created and running
Lab 02: Data Contracts - Contract enforced on dim_customers, unit tests passing
Lab 03: DRY Macros - generate_date_columns macro compiles correctly
Lab 04: Audit Helper - Model comparison executed successfully
Lab 05: Slim CI - GitHub Actions workflow configured and tested
Lab 06: Performance Analysis - Bottlenecks identified and recommendations generated

🎓 Learning Outcomes

By completing all labs, you will demonstrate the ability to:

Outcome	Verified In
Build a pipeline of models	All labs
Apply testing to the models	Labs 01, 02
Pick out problems when alerts fire	Lab 06
Review a pipeline and locate bottlenecks	Lab 06
Create reusable, maintainable code	Labs 03, 04
Set up production-grade CI/CD	Lab 05

📚 Additional Resources

📝 License

Dataset: CC BY-NC-SA 4.0 (Olist via Kaggle)

Built for Coding Black Females - Advanced dbt Module 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced dbt - Real Olist E-Commerce Dataset

🎯 Learning Objectives

🚀 Quick Start

Prerequisites

Setup (15 minutes)

Manual Data Loading (Alternative)

📁 Project Structure

🧪 Labs Overview

Session 1: Building Robust Tests

Session 2: Macros & Packages

Session 3: CI/CD & Production

Lab Structure

🧪 Testing Features

Schema Tests (dbt_expectations)

Custom Generic Tests

Data Contracts (dbt 1.5+)

Unit Tests (dbt 1.8+)

🔧 Key Commands

📦 Packages Used

🌐 CI/CD Pipeline

📊 Lab Completion Tracking

🎓 Learning Outcomes

📚 Additional Resources

📝 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analyses		analyses
data		data
dbt_packages		dbt_packages
labs		labs
logs		logs
macros		macros
models		models
prod-artifacts		prod-artifacts
scripts		scripts
seeds		seeds
snapshots		snapshots
target		target
tests		tests
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
performance_history.json		performance_history.json
profiles.yml.example		profiles.yml.example

cbfacademy/Advance-DBT

Folders and files

Latest commit

History

Repository files navigation

Advanced dbt - Real Olist E-Commerce Dataset

🎯 Learning Objectives

🚀 Quick Start

Prerequisites

Setup (15 minutes)

Manual Data Loading (Alternative)

📁 Project Structure

🧪 Labs Overview

Session 1: Building Robust Tests

Session 2: Macros & Packages

Session 3: CI/CD & Production

Lab Structure

🧪 Testing Features

Schema Tests (dbt_expectations)

Custom Generic Tests

Data Contracts (dbt 1.5+)

Unit Tests (dbt 1.8+)

🔧 Key Commands

📦 Packages Used

🌐 CI/CD Pipeline

📊 Lab Completion Tracking

🎓 Learning Outcomes

📚 Additional Resources

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages