Unsupervised Learning


Context:


AllLife Bank wants to focus on its credit card customer base in the next financial year. They have been advised by their marketing research team, that the penetration in the market can be improved. Based on this input, the Marketing team proposes to run personalized campaigns to target new customers as well as upsell to existing customers. Another insight from the market research was that the customers perceive the support services of the back poorly. Based on this, the Operations team wants to upgrade the service delivery model, to ensure that customers queries are resolved faster. Head of Marketing and Head of Delivery both decide to reach out to the Data Science team for help.


Objective:


Identify different segments in the existing customer based on their spending patterns as well as past interaction with the bank.


About the data:


Data is of various customers of a bank with their credit limit, the total number of credit cards the customer has, and different channels through which customer has contacted the bank for any queries, different channels include visiting the bank, online and through a call centre.

Importing libraries and overview of the dataset

Loading data

Check the info of the data

Observations:

There are no missing values. Let us now figure out the uniques in each column.

Data Preprocessing and Exploratory Data Analysis

Step 1: Identify and drop the rows with duplicated customer keys

We have done some basic checks. Now, let's drop the variables that are not required for our analysis.

Now that we have dropped unnecessary column. We can again check for duplicates. Duplicates would mean customers with identical features.

We can drop these duplicated rows from the data

Summary Statistics

Step 2: Write observations on the summary statistics of the data

Observations:

Now we will explore each variable at hand. We will check the distribution and outliers for each variable in the data.

Setp 3:

Observation:

Now, let's check the correlation among different variables.

Observation:

Scaling the data

K-Means

Let us now fit k-means algorithm on our scaled data and find out the optimum number of clusters to use.

We will do this in 3 steps:

  1. Initialize a dictionary to store the SSE for each k
  2. Run for a range of Ks and store SSE for each run
  3. Plot the SSE vs K and find the elbow

Step 4:

We have generated the labels with k-means. Let us look at the various features based on the labels.

Step 5: Create cluster profiles using the summary statistics and box plots for each label

Cluster Profiles:

Gaussian Mixture

Let's create clusters using Gaussian Mixture Models

Step 6:

Cluster Profiles:

Comparing Clusters:

K-Medoids

Step 7:

Let's compare the clusters from K-Means and K-Medoids

Cluster Profiles:

Comparing Clusters: