Penerapan Data Mining Untuk Klasifikasi Penduduk Miskin Di Kabupaten Labuhanbatu Menggunakan Random Forest Dan K-Nearest Neighbors
Abstract
This study aims to apply and compare the performance of two data mining algorithms—Random Forest (RF) and K-Nearest Neighbors (KNN)—in classifying poverty status among residents of Labuhanbatu Regency. The dataset includes information on occupation, income, housing, and education from 21,137 individuals. After undergoing preprocessing, model training, hyperparameter optimization, and evaluation, both models were assessed using five key metrics: accuracy, precision, recall, F1-score, and AUC. The results show that Random Forest performed slightly better than KNN, achieving an accuracy of 0.6023, precision of 0.4827, recall of 0.4177, F1-score of 0.4479, and an AUC of 0.5681. In comparison, KNN obtained an accuracy of 0.5990, precision of 0.4771, recall of 0.4006, F1-score of 0.4355, and an AUC of 0.5622. Based on these findings, it can be concluded that Random Forest is more effective for poverty classification on this dataset, although the performance difference is relatively small.
References
Ernawati, A., & Wahyuni, S. (2024). Analisis Data Mining Pola Penggunaan Seluler dan Klasifikasi Perilaku Pengguna di Berbagai Perangkat Menggunakan Metode C4 . 5. 5(4), 162–168. https://doi.org/10.47065/bit.v5i2.1689
Kamaliah, A. (2024). AI Bisa Bantu Atasi Masalah Keluarga Miskin di Indonesia. Detikinet. https://inet.detik.com/cyberlife/d-7368981/ai-bisa-bantu-atasi-masalah-keluarga-miskin-di-indonesia
Khaerunisa, S., Nur Padilah, T., & Haerul Jaman, J. (2024). Implementasi Data Mining Menggunakan Metode Regresi Data Panel Untuk Memprediksi Capaian Indeks Pembangunan Manusia. JATI (Jurnal Mahasiswa Teknik Informatika), 7(5), 3399–3406. https://doi.org/10.36040/jati.v7i5.7260
Mukhlis, I. R., Hayam, U., Perbanas, W., Pipin, S. J., & Mikroskil, U. (2024). BIG DATA ( Mengenal Big Data & Implementasinya di Berbagai Bidang ) (Issue February).
Sakti, R., & Daulay, A. (2024). Analisis Kritis dan Pengembangan Algoritma K-Nearest Neighbor ( KNN ): Sebuah Tinjauan Literatur. 4(2), 131–141.
Saputra, F. A., & Iskandar, A. (2023). Data Mining Penerapan Asosiasi Apriori Dalam Penentuan Pola Penjualan. Journal of Computer System and Informatics (JoSYC), 4(4), 778–788. https://doi.org/10.47065/josyc.v4i4.4043
Sari, R. P. (2024). Apa itu Data Mining? Pengertian, Metode dan Penerapannya. Cloud Computing. https://www.cloudcomputing.id/pengetahuan-dasar/apa-itu-data-mining
Sis. (2024). Random Forest vs Decision Tree. https://sis.binus.ac.id/2024/04/02/random-forest-vs-decision-tree/
sumut. (2019). Sejarah Sumatera Utara. https://sumutprov.go.id/artikel/halaman/sejarah
Wahyuni, S. (2018). Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree. Journal of Physics: Conference Series, 970(1). https://doi.org/10.1088/1742-6596/970/1/012030
Ali, M. M., Hariyati, T., Pratiwi, M. Y., & Afifah, S. (2022). Metodologi Penelitian Kuantitatif dan Penerapannya dalam Penelitian. Education Journal.2022, 2(2), 1–6.
BPS. (2024). Profil Kemiskinan Kabupaten Labuhanbatu Maret 2024. BPS. https://labuhanbatukab.bps.go.id/id/pressrelease/2024/07/29/266/profil-kemiskinan-kabupaten-labuhanbatu-maret-2024.html
BREIMAN, L. (2001). Random Forests LEO. Kluwer Academic Publishers. Manufactured in The Netherlands, 12343 LNCS, 503–515. https://doi.org/10.1007/978-3-030-62008-0_35
Sitorus, Z., Hariyanto, E., & Kurniawan, F. (2023). Analysis of Artificial Intelligence Machine Learning Technology for Mapping and Predicting Flood Locations in Pahlawan Batu Bara Village. 2(2). CV Hawari
Suci Amaliah, Nusrang, M., & Aswi, A. (2022). Penerapan Metode Random Forest Untuk Klasifikasi Varian Minuman Kopi di Kedai Kopi Konijiwa Bantaeng. VARIANSI: Journal of Statistics and Its Application on Teaching and Research, 4(3), 121–127. https://doi.org/10.35580/variansiunm31
sumut. (2019). Sejarah Sumatera Utara. https://sumutprov.go.id/artikel/halaman/sejarah
W, R. S. A., Hariyanto, E., & Sitorus, Z. (2022). COMPARISONAL ANALYSIS OF EUCLIDEAN , CANBERRA , AND CHEBECHEV DISTANCE MODELS ON KNN METHOD ON STUDENTS ’ VALUE. 10(3), 315–318.
Wikipedia. (2024). Kabupaten Labuhanbatu. https://id.wikipedia.org/wiki/Kabupaten_Labuhanbatu
B. Bangun and A. K. Karim, “Pengembalian Data Yang Hilang Pada Dataset Dengan Menggunakan Algoritma K-Nearest Neighbor Imputation Data Mining,” Jurnal Media Informatika Budidarma, vol. 8, no. 3, p. 1706, 2024, doi: 10.30865/mib.v8i3.8014.
A. Karim, “Penerapan Algoritma Entropy dan Aras Menentukan Desa Terbaik Di Pemerintah Kabupaten Labuhanbatu,” vol. 3, no. 1, pp. 33–43, 2022.
A. Karim, “Implementation of the Multi-Objective Optimization Method on the Basic of Ratio Analysis ( MOORA ) and Entropy Weighting in New Employee Recruitment,” vol. 5, no. 2, pp. 704–712, 2024, doi: 10.47065/josh.v5i2.4859.
A. Karim, “Clusterisasi Tingkat Pengangguran Terbuka Menurut Provinsi di Indonesia Menggunakan Algoritma K-Medoids,” 2024, doi: 10.47065/bits.v6i3.6198.
A. Karim, “Sistem Pendukung Keputusan Penerimaan Analis Di Pusat Penelitian Kelapa Sawit Menggunakan Metode Complex Proportional Assessment (Copras),” Buletin Ilmiah Informatika Teknologi, vol. 2, no. 1, pp. 32–42, [Online]. Available: https://ejurnal.amikstiekomsu.ac.id/index.php/BIIT
Abdul Karim, “Implementasi Metode Multi-Objective Optimization On The Basis Of Ratio Analysis dalam Seleksi Mahasiswa Program Indonesia Pintar,” Bulletin of Computer Science Research, vol. 3, no. 5, pp. 351–356, 2023, doi: 10.47065/bulletincsr.v3i5.283.
Z. Budiarso, H. Listiyono, and A. Karim, “Optimizing LSTM with Grid Search and Regularization Techniques to Enhance Accuracy in Human Activity Recognition,” Journal of Applied Data Sciences, vol. 5, no. 4, pp. 2002–2014, Dec. 2024, doi: 10.47738/jads.v5i4.433.
Copyright (c) 2025 Andi Ernawati, Khairul, Zulham Sitorus

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).