Data Engineering Lab

Practical, reproducible data engineering exercises: ingest → clean → load → query.

What this repo is

A small collection of pipeline projects built in Python + SQL with clear run steps and repeatable outputs.

Each pipeline starts from an external or raw source, lands data in SQLite, and writes a report that can be inspected without extra services.

What you'll find

pipelines/ ingestion + cleaning scripts
sql/ analytics and validation queries
scripts/ generated-output validation checks
data/ local databases + downloaded datasets
reports/ generated outputs (CSV summaries)

Quick start

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python pipelines/01_ingest_to_sqlite.py
python scripts/generate_data_quality_report.py
python scripts/validate_outputs.py

macOS/Linux:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python pipelines/01_ingest_to_sqlite.py
python scripts/generate_data_quality_report.py
python scripts/validate_outputs.py

After the first run, inspect:

data/titanic.db
reports/titanic_summary.csv
reports/data_quality_report.md

Technical review path

Run the Titanic pipeline to verify ingest, load, and reporting from a clean checkout.
Run python scripts/validate_outputs.py to verify the SQLite table, SQL files, and summary report.
Review docs/pipeline-contracts.md for the expected inputs, storage targets, and output checks.
Review sql/ for the analytics queries behind the reports.
Run the weather pipeline to see an append-style API ingestion example.
Compare generated CSV reports with the preview screenshots below.

Technical Scope

Python pipeline structure with explicit data and report paths
CSV ingestion, API ingestion, SQLite loading, and SQL-based summaries
Reproducible local outputs that do not require cloud credentials
Data contract validation for generated tables, report schemas, and SQL query execution
CI smoke test for the CSV pipeline

Pipelines

Pipeline	Source	Storage	Output	CI
Titanic CSV	Public CSV download	`data/titanic.db`	`reports/titanic_summary.csv`	Yes
Weather API	Open-Meteo current weather API	`data/weather.db`	`reports/weather_summary.csv`	Manual, live API

Validation

The Titanic pipeline has a local validation script and CI coverage:

python pipelines/01_ingest_to_sqlite.py
python scripts/generate_data_quality_report.py
python scripts/validate_outputs.py

The validation step checks the generated SQLite table, executes the SQL files in sql/, verifies the report schema, confirms the grouped passenger counts reconcile to the source table, and checks the generated data quality report.

See docs/pipeline-contracts.md for the current pipeline contracts.

1) Titanic CSV → SQLite → report

Creates:

data/titanic.db
reports/titanic_summary.csv
reports/data_quality_report.md

Run:

python pipelines/01_ingest_to_sqlite.py
python scripts/generate_data_quality_report.py

2) Weather API → SQLite → report

Appends current weather snapshots for a few cities.

Creates:

data/weather.db

Updates:

reports/weather_summary.csv

Run:

python pipelines/02_weather_api_to_sqlite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Lab

What this repo is

What you'll find

Quick start

Technical review path

Technical Scope

Pipelines

Validation

1) Titanic CSV → SQLite → report

2) Weather API → SQLite → report

Titanic summary preview

Weather summary preview

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
data		data
docs		docs
pipelines		pipelines
reports		reports
scripts		scripts
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Lab

What this repo is

What you'll find

Quick start

Technical review path

Technical Scope

Pipelines

Validation

1) Titanic CSV → SQLite → report

2) Weather API → SQLite → report

Titanic summary preview

Weather summary preview

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages