Welcome to the dbt Intro course! This project uses 10,000 REAL orders from Olist, a Brazilian e-commerce marketplace.
This project uses actual transactions from Olist, a real Brazilian e-commerce platform:
β
Authentic business patterns - Real customer behavior and ordering trends
β
Genuine geography - Actual Brazilian cities (SΓ£o Paulo, Rio de Janeiro, BrasΓlia)
β
Real data quality issues - Nulls, late deliveries, cancellations
β
Credible for portfolios - Work with data used by 1000+ analysts on Kaggle
β
Industry-standard - Same dataset used by data professionals worldwide
Dataset Source: Brazilian E-Commerce Public Dataset by Olist (Kaggle)
This is a 10,000 order subset of the full Olist dataset (100K orders, 2016-2018).
| Table | Records | Description |
|---|---|---|
| orders | 10,000 | Real orders with actual timestamps and order status |
| order_items | 11,253 | Products purchased (avg 1.13 items per order) |
| customers | 10,000 | Unique customers across Brazil |
| products | 6,124 | Product catalog with dimensions |
| product_category_name_translation | 71 | Portuguese β English category names |
# Check order ID format (MD5 hash like real Olist)
head -2 data/orders.csv | tail -1 | cut -d',' -f1
# Output: e481f51cbdc54678b7cc49136f2d6af7
# Check Brazilian city names
head -2 data/customers.csv | tail -1 | cut -d',' -f4
# Output: lencois paulista (real Brazilian city!)
# Check Portuguese categories
head -2 data/products.csv | tail -1
# Output: moveis_decoracao (furniture_decor in Portuguese!)Period: January - December 2017 (Full year)
Geography: Real Brazilian cities across 27 states
Categories: 71 authentic product categories
Order Statuses: delivered, shipped, canceled, processing
- Python 3.8+
- Google Cloud Platform account (free tier works!)
- Basic SQL knowledge
Step 1: Download this project
# Unzip the downloaded file
cd dbt_intro_olist_realStep 2: Set up BigQuery
Follow: setup/01_setup_bigquery.md
Step 3: Load REAL Olist data to BigQuery
cd setup
chmod +x 02_load_data_to_bigquery.sh
./02_load_data_to_bigquery.shThis loads all 10,000 real orders to BigQuery!
Step 4: Install dbt
pip install dbt-bigqueryStep 5: Configure dbt
Copy profiles.yml.example to ~/.dbt/profiles.yml and update with your GCP credentials.
Step 6: Test connection
dbt debugStep 7: Run your first models!
dbt run
dbt test
dbt docs generate
dbt docs servedbt_intro_olist_real/
βββ data/ # 10K REAL Olist CSV files β
β βββ orders.csv # 10,000 real orders
β βββ order_items.csv # 11,252 line items
β βββ customers.csv # 10,000 customers
β βββ products.csv # 6,123 products
β βββ product_category_name_translation.csv
β βββ DATASET_INFO.txt # Data authenticity proof
βββ setup/ # BigQuery setup guides
βββ models/
β βββ staging/ # Clean raw data (views)
β β βββ _sources.yml # Source definitions with tests
β β βββ stg_orders.sql # Orders with delivery metrics
β β βββ stg_order_items.sql
β β βββ stg_customers.sql # Brazilian geography
β β βββ stg_products.sql
β β βββ stg_product_categories.sql # Portuguese β English
β βββ marts/ # Business logic (tables)
β βββ customers.sql # Lifetime value & segments
β βββ orders.sql # Enriched order facts
βββ macros/
βββ tests/
βββ seeds/
βββ exercises/ # 3 hands-on exercises
- β Models - Transform raw data with SQL SELECT statements
- β Sources - Reference raw tables in BigQuery
- β ref() - Build dependency graph between models
- β Tests - Ensure data quality (unique, not_null, etc.)
- β Materializations - Views vs tables vs incremental
- β Macros - Reusable SQL functions with Jinja
- β Documentation - Auto-generate docs with lineage
- β Working with actual e-commerce data (10K orders)
- β Handling data quality issues (nulls, late deliveries)
- β Brazilian market geography (real cities and states)
- β Multi-language data (Portuguese categories β English)
- β Customer lifetime value analysis
- β Order fulfillment metrics (delivery times, late orders)
| Command | What It Does |
|---|---|
dbt run |
Build all models (staging + marts) |
dbt test |
Run data quality tests |
dbt docs generate |
Create documentation |
dbt docs serve |
View docs in browser (see DAG!) |
dbt run -s stg_orders |
Run specific model |
dbt test -s customers |
Test specific model |
After completing this course, you'll have:
- 5 staging models (cleaned raw data as views)
- 2 mart models (business metrics as tables)
- Lifetime orders per customer
- Total revenue per customer (Brazilian Real)
- Average order value
- Customer segments (High/Medium/Low value)
- Order totals (price + freight)
- Primary product category per order
- Late delivery flags (actual vs estimated)
- Customer location (city, state)
- 15+ tests passing
- Source freshness checks
- Custom business logic tests
Olist is a real Brazilian company that connects small businesses to major marketplaces. This dataset contains actual anonymized transactions from their platform.
Real orders from across Brazil:
- SΓ£o Paulo (SP) - Brazil's largest city and economic hub
- Rio de Janeiro (RJ) - Second largest city
- BrasΓlia (DF) - Capital city
- Plus 24 other Brazilian states
Real Olist categories:
cama_mesa_banhoβ bed_bath_tablebeleza_saudeβ health_beautyesportes_lazerβ sports_leisureinformatica_acessoriosβ computers_accessoriesmoveis_decoracaoβ furniture_decor- And 66 more authentic categories!
This is real data, so you'll encounter:
- β Successful deliveries (~85% of orders)
β οΈ Late deliveries (some orders exceed estimated date)- β Canceled orders (~4% cancellation rate)
- π¦ Orders in transit (no delivery date yet)
- π° Various payment amounts in Brazilian Real (R$)
- Explore the real data (15 min) - Check actual Brazilian orders in BigQuery
- Run staging models (15 min) - Clean and standardize raw data
- Exercise 1: Create your first model (20 min)
- Build mart models (30 min) - Calculate customer lifetime value
- Exercise 2: Use ref() and macros (20 min)
- Add tests (20 min) - Ensure data quality
- Exercise 3: Write your own tests (20 min)
- Generate docs (10 min) - See your lineage graph!
Total: 3.5 hours of hands-on learning with 10,000 real orders!
- β Complete BigQuery setup (15 min)
- β Load 10K real Olist orders to BigQuery
- β
Run
dbt run- Build models! - β
Run
dbt test- Verify data quality! - β
Run
dbt docs serve- See your pipeline! - β Complete exercises!
You're working with the same data that 1000+ analysts use on Kaggle! π
- Original Dataset: Kaggle - Brazilian E-Commerce by Olist
- dbt Documentation: docs.getdbt.com
- dbt Community: Slack
- Olist Company: olist.com
- 1000+ Kaggle Analyses: Compare your work to others!
- β Portfolio-worthy: "Built dbt pipeline with 10K Olist orders"
- β Verifiable: Order IDs match Kaggle dataset
- β Industry-recognized: Same data pros use for case studies
- β Authentic patterns: Real customer behavior, not simulated
- β Career-ready: Skills transfer directly to production work
- β Real 10K order dataset (included!)
- β Production-quality dbt project
- β Comprehensive documentation
- β 3 hands-on exercises with solutions
- β Ready to run (setup takes 15 minutes)
CC BY-NC-SA 4.0 (Creative Commons)
This is real commercial data made publicly available by Olist on Kaggle for educational and research purposes. The data has been anonymized (customer names removed, company names in reviews replaced with Game of Thrones houses).
Original Source: Olist (https://olist.com/)
Published On: Kaggle
Your Subset: 10,000 orders from 2017 (sampled from 100K total)
π Let's build with REAL data!