End-to-End Marketing Analytics Pipeline for Keyword Performance, Cost Optimization, and Traffic Insights
Excel Β· Python Β· MySQL Β· Power BI
- Overview
- Business Problem
- Project Objectives
- Tools & Technologies
- End-to-End Data Workflow
- Dataset
- Data Cleaning & Preparation
- Data Modeling & SQL Schema
- SQL Schema (Core with Queries)
- Relationship Diagram (EER)
- Power BI Data Model
- Power BI Dashboard
- Key Outcomes, Insights & Future Improvements
- How to Run This Project
- Repository Structure
- Author & Contacts
This project presents a complete, real-world Google AdWords analytics pipeline, transforming raw keyword-level marketing data into actionable insights and interactive dashboards.
The solution simulates an enterprise-grade ETL + BI workflow, commonly used by:
- Digital marketing teams
- Growth & performance analysts
- Business intelligence units
From raw Excel files to Python-based preprocessing, normalized MySQL modeling, and Power BI dashboards, this project demonstrates strong fundamentals in data engineering, analytics, and visualization.
Marketing teams often struggle to answer questions such as:
- Which keywords generate the most traffic at the lowest cost?
- Are we overspending on high-CPC, low-return keywords?
- How does keyword difficulty affect traffic contribution?
- What are the monthly and seasonal traffic trends?
- Which keywords should be scaled, optimized, or paused?
This project addresses these challenges by building a structured analytics framework for Google AdWords performance evaluation.
-
Analyze keyword-level traffic, cost, and competition metrics
-
Identify high-impact and low-efficiency keywords
-
Track search demand and CPC trends over time
-
Build a normalized relational database schema
-
Deliver interactive Power BI dashboards for decision-making
| Tool | Purpose |
|---|---|
| Excel | Raw data preparation, formulas, lookup tables |
| Python (pandas) | Data cleaning, keyword ID generation |
| MySQL | Relational modeling & SQL querying |
| Power BI | Data modeling, DAX, dashboards |
| Git & GitHub | Version control & documentation |
graph LR
A[Raw Excel Files] --> B[Excel Formulas]
B --> C[Python Processing]
C --> D[MySQL Database]
D --> E[Power BI Dashboard]
Each stage feeds clean, structured data into the nextβmirroring real production analytics pipelines.
πΉ Source: Public dataset downloaded from Kaggle (Google AdWordsβstyle keyword performance data)
πΉ Raw_data.xlsx (Simulated Google AdWords export) containing variables:
- Keyword
- Current & previous positions
- Cost Per Click (CPC)
- Traffic
- Competition
- Search volume
- Number of SERP results
Excel formulas were used to:
- Aggregate search volume using
SUMIF() - Calculate average keyword difficulty using
AVERAGEIF() - Categorize difficulty levels using
IF()logic - Create lookup tables using
VLOOKUP()
Using pandas, the following steps were performed:
- Loaded raw Excel data with
pd.read_excel() - Generated unique
keyword_IDvalues - Ensured consistent foreign-key mapping across tables
df['keyword_ID'] = df['keyword'].astype('category').cat.codesThe cleaned data was split into three reference tables:
keyword.csvβ Unique keywords with IDssearch_volume.csvβ Aggregated search volumekeyword_difficulty.csvβ Average difficulty and difficulty labels
| Formula | Purpose |
|---|---|
SUMIF() |
Aggregate total search volume |
AVERAGEIF() |
Compute average difficulty |
IF(B2>=50,"Hard","Moderate") |
Assign difficulty level |
VLOOKUP() |
Map keyword metadata |
β These lookup tables support normalization and ensure data integrity in the SQL schema. These tables act as lookup/reference tables for SQL.
πΉ keyword
| Column | Description |
|---|---|
keyword_ID |
Primary key |
keyword |
Keyword text |
πΉ search_volume
| Column | Description |
|---|---|
keyword_ID |
Foreign key |
search_volume |
Monthly search volume |
πΉ keyword_difficulty
| Column | Description |
|---|---|
keyword_ID |
Foreign key |
avg_difficulty |
Average difficulty score |
difficulty_level |
Hard / Moderate |
| Column | Description |
|---|---|
title |
Campaign / landing page |
keyword |
Keyword text |
keyword_ID |
Foreign key |
positions |
Current position |
previous_positions |
Previous position |
last_seen |
Date |
Search_Volume |
Monthly demand |
CPC |
Cost per click |
Traffic |
Estimated traffic |
Traffic_Percent |
Share of total traffic |
Traffic_Cost |
Cost attribution |
Traffic_Cost_Percent |
Cost share |
Competition |
Competition score |
Number_of_Results |
SERP results |
Keyword_difficulty |
Difficulty reference |
β This acts as the central fact table enabling all analytical queries.
CREATE DATABASE IF NOT EXISTS traffic_project;
USE traffic_project;
CREATE TABLE website_traffic_data (
title VARCHAR(250),
keyword VARCHAR(250),
keyword_ID INT,
positions INT,
previous_positions INT,
last_seen DATE,
Search_Volume INT,
CPC DECIMAL(10,2),
Traffic INT,
Traffic_Percent DECIMAL(10,2),
Traffic_Cost INT,
Traffic_Cost_Percent DECIMAL(10,2),
Competition DECIMAL(10,2),
Number_of_Results INT,
Keyword_difficulty INT
);
CREATE TABLE keyword (
keyword_ID INT PRIMARY KEY,
keyword VARCHAR(250)
);
CREATE TABLE keyword_difficulty (
keyword_ID INT PRIMARY KEY,
avg_difficulty DECIMAL(10,2),
difficulty_level VARCHAR(50)
);
CREATE TABLE search_volume (
keyword_ID INT PRIMARY KEY,
Search_Volume INT
);
ALTER TABLE website_traffic_data
ADD FOREIGN KEY (keyword_ID) REFERENCES keyword(keyword_ID);Foreign-key constraints ensure data integrity and normalization. These keys ensure consistent data joins between tables.
Imported the other CSVs into MySQL:
keywordsearch_volumekeyword_difficulty
Then established relational integrity:
ALTER TABLE website_traffic_data
ADD FOREIGN KEY (keyword_ID) REFERENCES keyword(keyword_ID);These keys ensure consistent data joins between tables.
These diagrams visualize the 1-to-many relationships between:
- Keywords β Traffic Data
- Keyword Difficulty β Traffic Data
- Search Volume β Traffic Data
The dashboard follows a star-schema data model, ensuring accurate filtering, optimized performance, and scalable reporting.
- Fact:
website_traffic_data - Dimensions:
keyword(via keyword_ID)keyword_difficultysearch_volume
β This model ensures accurate filtering and joins.
Average CPC = AVERAGE(website_traffic_data[CPC])
Total Traffic = SUM(website_traffic_data[Traffic])
Total Search Volume = SUM(website_traffic_data[Search_Volume])
Total Traffic Cost = SUM(website_traffic_data[Traffic_Cost])
Traffic Percent = AVERAGE(website_traffic_data[Traffic_Percent])
Start Date = MIN(website_traffic_data[last_seen])
Last Date = MAX(website_traffic_data[last_seen])
Calendar =
CALENDAR(
[Start Date],
[Last Date]
)
Month = FORMAT(Calendar[Date], "MMMM")
Quarter = QUARTER(Calendar[Date])
Year = YEAR(Calendar[Date])
Year & QTR = Year & " Q" & Quarter
These enable filtering, aggregation, and time-based visualizations.
The Power BI report connects directly to MySQL database and provides interactive insights.
- KPI Cards: Total Traffic, CPC, Search Volume, Cost
- Line Chart: Traffic trends over time
- Bar Chart: Traffic by keyword
- Treemap: Search volume distribution
- Pie/Donut Charts: Traffic distribution by keyword difficulty and by month
- Slicers: Year, Quarter, Keyword filter
- Identified high-traffic, low-CPC keywords for scaling
- Revealed high-cost, low-return keywords for optimization
- Highlighted difficulty-based traffic contribution
- Delivered a reusable marketing analytics framework
- High-search-volume keywords do not always deliver proportional traffic
- Certain keywords have high CPC but low traffic efficiency
- Majority of traffic comes from moderate to high difficulty keywords
- Monthly traffic trends reveal seasonal demand patterns
- Enabled campaign performance monitoring through a single dashboard
- Improved keyword prioritization strategy
- Supported cost optimization decisions
- Demonstrated the full analytics lifecycle: data modeling β SQL β BI
- Automate data ingestion using Google Ads API or scheduled ETL pipelines
- Add advanced KPIs such as CTR, Conversion Rate, and ROAS
- Implement time-series analysis for MoM/YoY and seasonality trends
- Enhance the data model with campaign, device, and location dimensions
- Introduce keyword clustering and performance forecasting
- Deploy the solution on cloud platforms for scalability and real-time analytics
π¦ google-adwords-performance-analysis
β£ π Raw_data.xlsx
β£ π website_traffic_data.csv
β£ π keyword.csv
β£ π keyword_difficulty.csv
β£ π search_volume.csv
β£ π SQL_schema_script.sql
β£ π AdWords_Traffic_Dashboard.pbix
β£ π Images
β β£ π· main_table.png
β β£ π· EER_Diagram.png
β β£ π· mysql_schema_editor.png
β β£ π· power_bi_dashboard.png
β β π· powerbi_data_model.png
β π README.mdgit clone https://github.com/Samiul1947/google-adwords-performance-analysis.git- Inspect
Raw_data.xlsxand the processed.csvfiles - (Optional) Modify data using Excel formulas if needed
- Import all
.csvfiles using MySQL Workbench - Execute the provided SQL schema to create tables and relationships
- Ensure foreign keys are applied correctly
- Open the provided .pbix file to view the interactive report
- Or connect manually via:
Home β Get Data β MySQL
Samiul Gazi | Aspiring Data Analyst
Passionate about data storytelling, unsupervised data exploration, and real-world analytics.
π M.Sc. Economics β University of Calcutta
π Kolkata, West Bengal
π§ Email: sgaziamumh@gmail.com
π LinkedIn: linkedin.com/in/samiul-gazi
π GitHub: github.com/Samiul1947
If you liked this project, please β the repo!
β For Recruiters & Hiring Managers
π This project demonstrates hands-on experience in ETL design, relational data modeling, SQL querying, DAX measures, and BI storytelling, which are core competencies for Data Analyst and Business Intelligence roles.