IBM-Data-Engineering Professional Certificate

End-to-End Data Platform Engineering — 13 Courses · 71 Commits · Capstone: Full-Stack Retail Analytics Platform

What this repository demonstrates

This repository documents a complete journey through data engineering fundamentals to production-grade platform design — built hands-on across relational databases, NoSQL stores, cloud data warehouses, big data infrastructure, ETL pipelines, BI dashboards, and distributed ML. Every module involved writing real code, designing real schemas, and operating real infrastructure — not slide decks.

The headline deliverable is the Course-13 Capstone Project — a fully integrated, multi-layer data platform built from scratch for a simulated e-commerce retailer, spanning 6 distinct technologies across 5 production-grade engineering modules.

Capstone Project (Course-13)

Full-stack retail data analytics platform — MySQL · MongoDB · PostgreSQL · IBM Db2 · Apache Airflow · Apache Spark · IBM Cognos

Built an end-to-end data engineering platform for a fictional e-commerce retailer (SoftCart) that mirrors real-world enterprise architecture: transactional OLTP system, NoSQL product catalog, cloud data warehouse, automated ETL pipelines, BI dashboard, and big data ML analytics — all integrated and operational.

Module	What was built	Key tools
M1 — OLTP Database	Normalized MySQL schema for sales transactions; indexed; automated CSV export via Bash	MySQL, SQL, Bash
M2 — NoSQL Catalog	MongoDB product catalog with 9 collections; CRUD + aggregation queries	MongoDB, JSON
M3 — Data Warehouse	Star schema (PostgreSQL → IBM Db2); ROLLUP, CUBE, MQT analytics queries	PostgreSQL, IBM Db2, SQL
M4 — BI Dashboard	3-chart Cognos dashboard: quarterly sales bar, category pie, monthly trend line	IBM Cognos Analytics
M5 — ETL Pipelines	Python ETL automating MySQL→Db2 sync; Airflow DAG for web log pipeline	Python, Pandas, Apache Airflow
M6 — Big Data ML	PySpark + SparkML Linear Regression on search-term data; sales forecast output	Apache Spark, PySpark, SparkML

→ Full capstone documentation with architecture diagrams

Standalone Projects

ETL Pipeline — Housing Dataset (separate repository)

Python · SQLAlchemy · PostgreSQL · xlsxwriter

A 6-stage modular ETL pipeline processing 63,474 property transaction records: HTTP extraction → SQLAlchemy transformation → PostgreSQL load → insight generation → Excel export. Built as independent, reusable Python modules with an orchestration entry point.

→ github.com/smshde/ETL_Housing-Dataset

Exploratory Data Analysis — PA 911 Emergency Calls

Python · Pandas · NumPy · Matplotlib · Seaborn

Analysed 663,000+ emergency call records from Montgomery County, PA. Performed feature engineering, temporal pattern analysis, and heatmap visualisation to surface peak-demand windows and call-type distribution — the kind of analysis that informs staffing and resource allocation decisions.

→ github.com/smshde/PA-911-Calls-EDA

Course Modules — Skills Built

#	Module	Skills demonstrated
01	Introduction to Data Engineering	Data lifecycle, pipeline patterns, architecture principles
02	Python for Data Science, AI & Development	Python fundamentals, APIs, data structures
03	Python Project for Data Engineers	ETL scripting, data extraction, file I/O
04	RDBMS Fundamentals	Relational modelling, normalization, DDL/DML
05	Databases & SQL for Data Science	Joins, subqueries, views, stored procedures, transactions
06	Linux Commands & Shell Scripting	Bash scripting, cron scheduling, file automation
07	Relational Database Administration	Backup, recovery, user management, performance tuning
08	ETL & Data Pipelines — Shell, Airflow, Kafka	DAG design, pipeline orchestration, streaming concepts
09	Data Warehousing & BI Analytics	Star schema, OLAP, Cognos dashboards
10	Introduction to NoSQL Databases	MongoDB, Cassandra, DynamoDB, document modelling
11	Big Data with Spark & Hadoop	HDFS, MapReduce, Spark DataFrames, Hive, Sqoop
12	Data Engineering & ML using Spark	SparkML, feature engineering, model pipelines
13	DE Capstone Project	Full-stack platform integration — see above

Technology Footprint

Languages     Python · SQL · Bash/Shell · Scala
Databases     MySQL · PostgreSQL · IBM Db2 · MongoDB · Cassandra · DynamoDB
Big Data      Apache Spark · PySpark · SparkML · Hadoop · Hive · Sqoop · HBase
Pipelines     Apache Airflow · Kafka · Shell ETL · Python ETL
Cloud         IBM Cloud (Db2, Object Storage, Watson Studio) · AWS (supplementary)
BI            IBM Cognos Analytics · Matplotlib · Seaborn · Plotly
DevOps        Git · GitHub · Jupyter Notebooks · Linux

Certification

IBM Data Engineering Professional Certificate — Coursera / IBM Skills Network
13-course program · Capstone verified ·

→ For the capstone architecture, module-by-module breakdown, and diagrams: Course-13 README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBM-Data-Engineering Professional Certificate

End-to-End Data Platform Engineering — 13 Courses · 71 Commits · Capstone: Full-Stack Retail Analytics Platform

What this repository demonstrates

Capstone Project (Course-13)

Standalone Projects

ETL Pipeline — Housing Dataset (separate repository)

Exploratory Data Analysis — PA 911 Emergency Calls

Course Modules — Skills Built

Technology Footprint

Certification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
01. Introduction to Data Engineering		01. Introduction to Data Engineering
02. Python for Data Science, AI & Development		02. Python for Data Science, AI & Development
Course-10-Intoduction to NoSQL		Course-10-Intoduction to NoSQL
Course-11-Introduction to Big Data with Spark and Hadoop		Course-11-Introduction to Big Data with Spark and Hadoop
Course-12-Data Engineering and Machine Learning using Spark		Course-12-Data Engineering and Machine Learning using Spark
Course-13- DE Capstone Project		Course-13- DE Capstone Project
Course-3-Python Project for DE		Course-3-Python Project for DE
Course-4-RDBMS Fundamentals		Course-4-RDBMS Fundamentals
Course-5-Databases & SQL		Course-5-Databases & SQL
Course-6- Linux Commands and Shell Scripting		Course-6- Linux Commands and Shell Scripting
Course-7-Relational Database Administrator		Course-7-Relational Database Administrator
Course-8-ETL and Data Pipelines with Shell, Airflow and Kafka		Course-8-ETL and Data Pipelines with Shell, Airflow and Kafka
Course-9-Getting Started with Data Warehousing and BI Analytics		Course-9-Getting Started with Data Warehousing and BI Analytics
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

IBM-Data-Engineering Professional Certificate

End-to-End Data Platform Engineering — 13 Courses · 71 Commits · Capstone: Full-Stack Retail Analytics Platform

What this repository demonstrates

Capstone Project (Course-13)

Standalone Projects

ETL Pipeline — Housing Dataset (separate repository)

Exploratory Data Analysis — PA 911 Emergency Calls

Course Modules — Skills Built

Technology Footprint

Certification

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages