A Comparative Analysis of Clustering Methods for Identifying Patient Subgroups in Chronic Kidney Disease Using Feature Engineering

Title:A Comparative Analysis of Clustering Methods for Identifying Patient Subgroups in Chronic Kidney Disease Using Feature Engineering

Authors:Nichalini Kandasamy, Artie Basukoski and Thierry Chaussalet

Conference:IEEE CBMS 2025

Tags:Chronic Kidney Disease, clustering algorithms, clustering techniques as a feature engineering, ehr data, Feature engineering, Gaussian Mixture Model, HDBSCAN, Hierarchical Clustering, ICD-10 Codes, KMeans, machine learning and MIMIC IV

Abstract:

Chronic Kidney Disease (CKD) is a chronic disease that progressively deteriorates health conditions of the affected patients. It also poses substantial challenges for clinicians in accurately predicting disease progression trajectories, detecting cohorts of patients and anticipating patient management strategies. This study investigates the potential benefits of using unsupervised clustering techniques as a feature engineering method to enhance the identification of subgroups in CKD patients. Specifically, we evaluate the performance of four frequently used clustering algorithms, each with different characteristics and speciality - K-Means, HDBSCAN, Hierarchical Clustering, and Gaussian Mixture Models - on a publicly available clinical dataset (MIMIC IV) formed of detailed real-world records related to CKD patients, including relevant laboratory test results, demographic details, and sequences of ICD-10 codes. The analysis indicates that complex clustering algorithms, like GMM and HDBSCAN, do not have a higher accuracy in detecting clusters on the given clinical dataset than simpler algorithms such as K-Means and Hierarchical clustering.