Analysis of Inpatient Data Using Cluster Analysis on Simulation Dataset

Andysah  Putera Utama Siahaan; Nur  Azizah Harahap; Rahma  Yuni Simanullang; Khairunnisa; Puspita  Wanny; Utari

doi:10.47065/bit.v6i1.1830

Andysah Putera Utama Siahaan Universitas Pembangunan Panca Budi, Indonesia
Nur Azizah Harahap * Universitas Pembangunan Panca Budi, Indonesia
Rahma Yuni Simanullang Universitas Pembangunan Panca Budi, Indonesia
Khairunnisa Universitas Pembangunan Panca Budi, Indonesia
Puspita Wanny Universitas Pembangunan Panca Budi, Indonesia
Utari Universitas Pembangunan Panca Budi, Indonesia

DOI: https://doi.org/10.47065/bit.v6i1.1830

Keywords: Inpatient; clustering; K-Means; data analysis; cluster evaluation

Abstract

This study aims to analyze inpatient data using the K-Means Clustering method on a simulated dataset. The dataset includes various patient-related attributes such as age, billing amount, length of stay, medical condition, and type of admission. Several preprocessing steps were applied, including date conversion, duration calculation, numerical normalization, and one-hot encoding for categorical attributes. The Elbow Method was used to determine the optimal number of clusters, and clustering quality was evaluated using both the Silhouette Score and Davies-Bouldin Index. The analysis results show that the patients can be segmented into three major clusters, each exhibiting distinct characteristics—for example, younger patients with short and low-cost stays, and elderly patients with prolonged and more expensive hospitalizations. The resulting Silhouette Score of 0.14 and Davies-Bouldin Index of 1.74 reflect a moderate clustering performance, yet the model remains informative and meaningful. These clusters provide actionable insights that hospitals can use to optimize their service strategies, improve resource allocation, and enhance operational efficiency. Moreover, the study illustrates the practical application of unsupervised learning techniques in healthcare settings, contributing to data-driven decision-making practices and offering a foundation for further research into patient segmentation.

References

V. V. Baligodugula and F. Amsaad, “Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2503.23215

H. Byeon, P. Kumar, I. R. Khan, F. Y. Alghayadh, M. A. Rusho, and M. Soni, “Unsupervised Single Valued Neutrosophic Sets Approach for Cloud Clustering,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 3580–3589. doi: 10.1016/j.procs.2025.04.613.

S. Anam, Z. Fitriah, N. Hidayat, H. Akbar, and A. Maulana, “Classification Model for Diabetes Mellitus Diagnosis based on K-Means Clustering Algorithm Optimized with Bat Algorithm.” [Online]. Available: www.ijacsa.thesai.org

W. Aulia, A. Putera Utama Siahaan, L. Marlina, and M. Iqbal, “K-Means Clustering Algorithm Analysis For Grouping Patient Medical Record Data Based On Disease Type-Wina Aulia et.al K-Means Clustering Algorithm Analysis For Grouping Patient Medical Record Data Based On Disease Type,” Informatika dan Sains, vol. 14, no. 04, p. 2024, doi: 10.54209/infosains.v14i04.

I. D. Borlea, R. E. Precup, and A. B. Borlea, “Improvement of K-means Cluster Quality by Post Processing Resulted Clusters,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 63–70. doi: 10.1016/j.procs.2022.01.009.

M. D. Chandra, E. Irawan, I. S. Saragih, A. P. Windarto, and D. Suhendro, “Penerapan Algoritma K-Means dalam Mengelompokkan Balita yang Mengalami Gizi Buruk Menurut Provinsi,” BIOS : Jurnal Teknologi Informasi dan Rekayasa Komputer, vol. 2, no. 1, pp. 30–38, Mar. 2021, doi: 10.37148/bios.v2i1.19.

T. M. Ghazal et al., “Performances of k-means clustering algorithm with different distance metrics,” Intelligent Automation and Soft Computing, vol. 30, no. 2, pp. 735–742, 2021, doi: 10.32604/iasc.2021.019067.

I. K. Khan et al., “Standardization of expected value in gap statistic using Gaussian distribution for optimal number of clusters selection in K-means,” Egyptian Informatics Journal, vol. 30, Jun. 2025, doi: 10.1016/j.eij.2025.100701.

Y. Li and H. Zhang, “Big data technology for teaching quality monitoring and improvement in higher education - joint K-means clustering algorithm and Apriori algorithm,” Systems and Soft Computing, vol. 6, Dec. 2024, doi: 10.1016/j.sasc.2024.200125.

M. R. Nahoujy, “Applying a K-means model to TSD data to find categories for the structural assessment of flexible pavements,” Transportation Engineering, vol. 20, Jun. 2025, doi: 10.1016/j.treng.2025.100342.

E. U. Oti, M. O. Olusola, F. C. Eze, and S. U. Enogwe, “Comprehensive Review of K-Means Clustering Algorithms,” International Journal of Advances in Scientific Research and Engineering, vol. 07, no. 08, pp. 64–69, 2021, doi: 10.31695/ijasre.2021.34050.

Y. Pang and D. Nie, “Regional economic development level assessment based on K-means clustering algorithm,” Procedia Comput Sci, vol. 262, pp. 1137–1143, 2025, doi: 10.1016/j.procs.2025.05.152.

W. A. Prastyabudi, A. N. Alifah, and A. Nurdin, “Segmenting the Higher Education Market: An Analysis of Admissions Data Using K-Means Clustering,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 96–105. doi: 10.1016/j.procs.2024.02.156.

M. Puli et al., “Exploring Optimal Cluster Quality in Health Care Data (HCD): Comparative Analysis utilizing k-means Elbow and Silhouette Analysis,” 2024. [Online]. Available: www.iscientific.org/Journal.html

W. Zhang, L. Wu, and S. Zhang, “Clinical phenotype of ARDS based on K-means cluster analysis: A study from the eICU database,” Heliyon, vol. 10, no. 20, Oct. 2024, doi: 10.1016/j.heliyon.2024.e39198.

N. Alajmi, N. Ali, A. L. Ajmi, and M. Sabihaksoy, “A Review Of Big Data Analytic In Healthcare,” 2021. [Online]. Available: https://www.researchgate.net/publication/354365827

D. Sumrit, “Unveiling the effects of big data analytic capability on improving healthcare supply chain resilience: An integrated MCDM with spherical fuzzy information approach,” Results in Engineering, vol. 25, Mar. 2025, doi: 10.1016/j.rineng.2025.104499.

L. J. Basile, N. Carbonara, U. Panniello, and R. Pellegrino, “The role of big data analytics in improving the quality of healthcare services in the Italian context: The mediating role of risk management,” Technovation, vol. 133, May 2024, doi: 10.1016/j.technovation.2024.103010.

K. Chao, M. N. I. Sarker, I. Ali, R. B. R. Firdaus, A. Azman, and M. M. Shaed, “Big data-driven public health policy making: Potential for the healthcare industry,” Heliyon, vol. 9, no. 9, Sep. 2023, doi: 10.1016/j.heliyon.2023.e19681.

K. N. Singh and J. K. Mantri, “A clinical decision support system using rough set theory and machine learning for disease prediction,” Intelligent Medicine, Aug. 2024, doi: 10.1016/j.imed.2023.08.002.

V. Yfantis, A. Wagner, and M. Ruskowski, “Federated K-means clustering via dual decomposition-based distributed optimization,” Franklin Open, vol. 10, Mar. 2025, doi: 10.1016/j.fraope.2024.100204.

E. Durom et al., “Quantification of 129Xe MRI Ventilation-defect-percent Using Binary-threshold, Gaussian Linear-Binning and K-means Methods: Differences in Asthma and COPD,” Acad Radiol, 2025, doi: 10.1016/j.acra.2025.04.030.

S. Ilbeigipour, A. Albadvi, and E. Akhondzadeh Noughabi, “Cluster-based analysis of COVID-19 cases using self-organizing map neural network and K-means methods to improve medical decision-making,” Inform Med Unlocked, vol. 32, Jan. 2022, doi: 10.1016/j.imu.2022.101005.

S. J. Maceachern and N. D. Forkert, “Machine learning for precision medicine,” 2021, Canadian Science Publishing. doi: 10.1139/gen-2020-0131.

B. Zhou, B. Lu, and S. Saeidlou, “A Hybrid Clustering Method Based on the Several Diverse Basic Clustering and Meta-Clustering Aggregation Technique,” Cybern Syst, vol. 55, no. 1, pp. 203–229, 2024, doi: 10.1080/01969722.2022.2110682.

F. Harahap, N. E. Saragih, E. T. Siregar, and H. Sariangsah, “Penerapan Data Mining Dengan Algoritma Naive Bayes Classifier Dalam Memprediksi Pembelian Cat,” Jurnal Ilmiah Informatika, vol. 9, no. 01, pp. 19–23, 2021, doi: 10.33884/jif.v9i01.3702.

U. Suriani, “Penerapan Data Mining untuk Memprediksi Tingkat Kelulusan Mahasiswa Menggunakan Algoritma Decision Tree C4. 5,” Journal of Computer and Information Systems Ampera, vol. 4, no. 2, pp. 55–65, 2023, doi: 10.51519/journalcisa.v4i2.393.

F. Alghifari and D. Juardi, “Penerapan Data Mining Pada Penjualan Makanan dan Minuman Menggunakan Metode Algoritma Naïve Bayes: Studi Kasus: Makan Barbeque Sepuasnya,” Jurnal Ilmiah Informatika, vol. 9, no. 02, pp. 75–81, 2021, doi: 10.33884/jif.v9i02.3755.

C. Hardjono and S. M. Isa, “Implementation of Data Mining for Churn Prediction in Music Streaming Company Using 2020 Dataset,” Journal on Education, vol. 5, no. 1, pp. 1189–1197, 2022, doi: 10.31004/joe.v5i1.740.

D. Marlina and M. Bakri, “Penerapan Data Mining Untuk Memprediksi Transaksi Nasabah Dengan Algoritma C4. 5,” Jurnal Teknologi Dan Sistem Informasi, vol. 2, no. 1, pp. 23–28, 2021, doi: 10.33365/jtsi.v2i1.627.

S. N. B. Sembiring, H. Winata, and S. Kusnasari, “Pengelompokan Prestasi Siswa Menggunakan Algoritma K-Means,” Jurnal Sistem Informasi Triguna Dharma (JURSI TGD), vol. 1, no. 1, pp. 31–40, 2022, doi: 10.53513/jursi.v1i1.4784.

A. Yudistira and R. Andika, “Pengelompokan Data Nilai Siswa Menggunakan Metode K-Means Clustering,” Journal of Artificial Intelligence and Technology Information, vol. 1, no. 1, pp. 20–28, 2023, doi: 10.58602/jaiti.v1i1.22.

P. Apriyani, A. R. Dikananda, and I. Ali, “Penerapan Algoritma K-Means dalam Klasterisasi Kasus Stunting Balita Desa Tegalwangi,” Hello World Jurnal Ilmu Komputer, vol. 2, no. 1, pp. 20–33, 2023, doi: 10.56211/helloworld.v2i1.230.

Analysis of Inpatient Data Using Cluster Analysis on Simulation Dataset

Abstract

References

Most read articles by the same author(s)