You are working for a new-age insurance company and employ mutiple outreach plans to sell term insurance to your customers. Telephonic marketing campaigns still remain one of the most effective way to reach out to people however they incur a lot of cost. Hence, it is important to identify the customers that are most likely to convert beforehand so that they can be specifically targeted via call. We are given the historical marketing data of the insurance company and are required to build a ML model that will predict if a client will subscribe to the insurance.
- age (numeric)
- job : type of job
- marital : marital status
- educational_qual : education status
- call_type : contact communication type
- day: last contact day of the month (numeric)
- mon: last contact month of yeardur: last contact duration, in seconds (numeric)
- num_calls: number of contacts performed during this campaign and for this client
- prev_outcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")
y - has the client subscribed to the insurance?
It is not sufficient to just fit a model - the model must be analysed to find the important factors that contribute towards the conversion rate. AUROC must be used as a metric to evaluate the performance of the models.
This was the problem given to me as part of my final project in the Master Data Science course by GUVI.
The data was loaded and preprocessed - cleaned.
- Data Visualization was done and Exploratory Data Analyis was done to take some meaningful insights.
- It was a highly imbalanced data with less than 13% of the data in the subscribed category.
- Data was balanced using SMOTE and Combined balancing techniques.
- Data was fitted in Logistic regression. The AUROC score was good. But the domain side of the data needed a decent F1 score to build a reliable model.
- Comparison and tuning of the models was done uisng Pycaret.
- Feature importances were analysed.