Customer churn is a major challenge in the telecom industry, directly impacting revenue and customer lifetime value.
This project analyzes customer behavior to identify key drivers of churn using a structured data pipeline, SQL-based feature engineering, and exploratory data analysis (EDA).
The objective of this project is to:
- Analyze customer churn patterns
- Identify high-risk customer segments
- Understand business drivers behind churn
- Quantify revenue impact due to churn
The insights from this analysis can help telecom companies design effective retention strategies.
- **Database:** PostgreSQL (Dockerized)
- **Query Language:** SQL
- **Programming Language:** Python
- **Data Analysis:** Pandas, NumPy
- **Visualization:** Matplotlib, Seaborn
- **Machine Learning:** Scikit-learn
- **Environment:** Jupyter Notebook
- **Version Control:** Git & GitHub
The project follows a structured, production-style data workflow to ensure data quality, reproducibility, and clear separation of concerns.
- The telecom churn dataset is ingested into a Dockerized PostgreSQL database.
- Raw data is stored in a staging table (telecom\_churn\_raw) without modification.
- Data is cleaned and standardized using SQL.
- Missing and inconsistent values are handled.
- Clean data is stored in a separate table (telecom\_churn\_clean).
- Business-driven features such as tenure groups, contract types, engagement metrics, and revenue impact are engineered using SQL.
- This step ensures features are explainable and aligned with business logic.
- Clean data is loaded into Python using SQLAlchemy.
- Visual and statistical analysis is performed to identify churn patterns and key drivers.
- Machine learning models will be trained to predict customer churn based on engineered features.
The following business-driven features were engineered and analyzed to understand customer churn:
- **Tenure Group:** Categorizes customers into New, Mid-term, and Long-term lifecycle stages.
- **Contract Type:** Identifies churn risk across month-to-month, one-year, and two-year contracts.
- **Monthly Charges:** Captures pricing sensitivity among customers.
- **Payment Method:** Highlights churn patterns caused by payment friction.
- **Internet Service Type:** Analyzes churn differences across service offerings.
- **Add-on Engagement:** Aggregated engagement score based on subscribed add-on services.
- **Revenue at Risk:** Quantifies potential monthly revenue loss due to churn.
- **New customers** show significantly higher churn compared to long-term customers.
- **Month-to-month contracts** have the highest churn rate, while long-term contracts reduce churn risk.
- Customers with **higher monthly charges** are more likely to churn, indicating pricing sensitivity.
- **Electronic check** payment method users churn more than auto-payment users.
- Customers with **low add-on engagement** exhibit higher churn.
- A measurable amount of **monthly revenue is at risk** due to customer churn, highlighting the need for proactive retention strategies.
1. Clone the repository:
```bash
git clone
Telecom-Churn-Analysis/
│
├── data/
│ └── Telco-Customer-Churn.csv
│
├── sql/
│ ├── data_loading.sql
│ ├── data_cleaning.sql
│ └── feature_engineering.sql
│
├── notebooks/
│ └── 01_EDA.ipynb
│
├── visuals/
│
├── requirements.txt
└── README.md