Skip to content

SakinaJaffri/Walmart_Sales_DataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🛒 Walmart Sales Data Analysis

Walmart_logo_(2008) svg (1)

📌 Overview

This project explores sales data from three Walmart branches in Myanmar (Mandalay, Yangon, and Naypyitaw). The goal is to uncover insights about product performance, customer behavior, and sales trends, using SQL.

I performed data cleaning, feature engineering, and exploratory analysis to answer business questions like:

  • Which product lines generate the most revenue?
  • What times of day are the busiest?
  • How does customer type affect revenue?

📁 Data Source

The dataset is inspired by the Kaggle Walmart Sales Forecasting Competition, but uses a simplified structure with 1,000 rows and 17 columns. The data contains transaction-level details for three branches.

🔧 Methods and Tools

  • SQL (PostgreSQL/MySQL)
  • Data wrangling
  • Feature engineering (e.g., extracting day, time, month)
  • Aggregation, grouping, subqueries, window functions

Analysis:

  1. Product Analysis

Conduct analysis on the data to understand the different product lines, the products lines performing best and the product lines that need to be improved.

  1. Sales Analysis

This analysis aims to answer the question of the sales trends of product. The result of this can help use measure the effectiveness of each sales strategy the business applies and what modificatoins are needed to gain more sales.

  1. Customer Analysis

This analysis aims to uncover the different customers segments, purchase trends and the profitability of each customer segment.

Approach Used

  1. Data Wrangling: This is the first step where inspection of data is done to make sure NULL values and missing values are detected and data replacement methods are used to replace, missing or NULL values.
  1. Build a database
  2. Create table and insert the data.
  3. Select columns with null values in them. There are no null values in our database as in creating the tables, we set NOT NULL for each field, hence null values are filtered out.
  1. Feature Engineering: This will help use generate some new columns from existing ones.
  1. Add a new column named time_of_day to give insight of sales in the Morning, Afternoon and Evening. This will help answer the question on which part of the day most sales are made.
  1. Add a new column named day_name that contains the extracted days of the week on which the given transaction took place (Mon, Tue, Wed, Thur, Fri). This will help answer the question on which week of the day each branch is busiest.
  1. Add a new column named month_name that contains the extracted months of the year on which the given transaction took place (Jan, Feb, Mar). Help determine which month of the year has the most sales and profit.
  1. Exploratory Data Analysis (EDA): Exploratory data analysis is done to answer the listed questions and aims of this project.

Business Questions To Answer

Generic Question

  1. How many unique cities does the data have?
  2. In which city is each branch?

Product

  1. How many unique product lines does the data have?
  2. What is the most common payment method?
  3. What is the most selling product line?
  4. What is the total revenue by month?
  5. What month had the largest COGS?
  6. What product line had the largest revenue?
  7. What is the city with the largest revenue?
  8. What product line had the largest VAT?
  9. Fetch each product line and add a column to those product line showing "Good", "Bad". Good if its greater than average sales
  10. Which branch sold more products than average product sold?
  11. What is the most common product line by gender?
  12. What is the average rating of each product line?

Sales

  1. Number of sales made in each time of the day per weekday
  2. Which of the customer types brings the most revenue?
  3. Which city has the largest tax percent/ VAT (Value Added Tax)?
  4. Which customer type pays the most in VAT?

Customer

  1. How many unique customer types does the data have?
  2. How many unique payment methods does the data have?
  3. What is the most common customer type?
  4. Which customer type buys the most?
  5. What is the gender of most of the customers?
  6. What is the gender distribution per branch?
  7. Which time of the day do customers give most ratings?
  8. Which time of the day do customers give most ratings per branch?
  9. Which day fo the week has the best avg ratings?
  10. Which day of the week has the best average ratings per branch?

📁 File Structure

  • README.md – This file
  • SQL_queries.sql – All SQL queries used for the analysis
  • walmart_sales.csv – Dataset used for the project

✅ Conclusion

This project(Guided) helped solidify my understanding of SQL for data analysis, including grouping, filtering, joins, and window functions. I also learned how to extract time-based features and draw meaningful business insights from transaction data.

About

This project explores sales data from three Walmart branches in Myanmar (Mandalay, Yangon, and Naypyitaw). The goal is to uncover insights about product performance, customer behavior, and sales trends, using SQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors