Credit Card Clustering

Credit Card Clustering.png

This was a project when I was enrolled in Purwadhika Technology School, a Data Science & Machine Learning class.

This project conducts Customer Segmentation on a Credit Card dataset that was obtained from Kaggle. I started this Data Science project with my friends whose role are as Business Analysts. We came up with this project idea with the purpose of implementing the knowledge we have learned on Data Science especially on algorithms used in Machine Learning for Clustering.

As a Data Scientist myself, my tasks are collecting the dataset and prepare it by cleaning missing and invalid values, followed by exploring the data, build the Machine Learning model, and evaluate it to make sure that we provide useful insights on regard of segmenting the Credit Card Customers into several groups/clusters, so stakeholders could make decision such as:

Give promotions on cheap items to a customer who has low balance and often buy items using installment.
Give reward like points to customers who often create transactions using the credit card, where the points can be transferred to the user’s balance. This will make customers become loyal and reduce churn rates.

During the completion of this project, I faced some challenges to come up with the best results, I had to conduct various experiment on this data which was an iterative process. I even had to explore the data again if the results don’t seem satisfying. The way I resolved this obstacle is by documenting every experiment I created. For instance, when doing the Hyperparameter Tuning, I documented which parameters are being used, and see the output of each model, then choose the best one (i.e., the parameters that result in the best cluster or customer segmentation). Another challenge was to understand the algorithms that were being used (e.g., K-Means, K Medoids, and DBSCAN). My approach for this issue was to read journals on how each of these algorithm works.

Screenshots

The cluster result that we considered the best in segmenting the customer data, after trying out different data pre-processing

Each customer segment versus 3 parameters (Balance, Balance Frequency, and Purchases)

Deployment phase where we finally made a simple web application with Python’s Streamlit, where user can try out different parameter and see the cluster result.