The Sales ETL Project is an Extract, Transform, and Load (ETL) pipeline developed using SQL Server Integration Services (SSIS). This project processes sales data from an Excel file source and loads it into a star schema in an SQL Server database. It incorporates advanced ETL features such as delta loading, incremental loading, Slowly Changing Dimensions (SCD), and data transformation tasks to ensure efficient and reliable data processing.
- SalesFact: Stores sales transaction data with references to dimension tables.
- DateDimension: Stores calendar dates with relevant attributes.
- CustomerDimension: Contains customer data with Type 2 Slowly Changing Dimensions (SCD).
- ProductDimension: Includes detailed product information.
1- SCD Type 2 Implementation: Tracks historical changes in customer data in the CustomerDimension table.
2- Delta Load: Updates only new or modified records in the ProductDimension table.
3- Incremental Load: Inserts new data into the SalesFact table without duplicating existing records.
4- Data Transformation: Calculates derived columns, including:
- Total Sales Amount:
Unit Price × Quantity
- Expiry Status:
ActiveorExpiredbased on the expiry date.
- Customer Loyalty Tier: Categorizes customers as:
Bronze(low spending)Silver(medium spending)Gold(high spending)
- Order Priority: Classifies orders as
HighorNormalbased on order quantity.
- Product Age: Calculates the number of days since the product became effective.
- Discount Category: Categorizes discounts as:
No DiscountLowMediumHigh
- ETL Tool: SQL Server Integration Services (SSIS)
- Database Management: SQL Server Management Studio (SSMS)
- Source Data: Microsoft Excel
- Source Data: Located in the data folder. This includes the sales data Excel file.
- ETL Packages: SSIS packages handling data extraction, transformation, and loading processes.
- Destination Tables: SQL Server database designed as a star schema.