Optimasi Random Forest Melalui Feature Engineering dan SMOTE untuk Klasifikasi Kesehatan Mental
Abstract
Student mental health is a crucial factor affecting academic performance, productivity, and overall quality of life in university environments. The high prevalence of psychological disorders today demands an accurate early detection system to provide timely and efficient intervention. This study aims to develop a student mental health classification model by integrating feature engineering techniques and the Synthetic Minority Oversampling Technique (SMOTE) with the Random Forest algorithm. The feature engineering stage is conducted through the creation of a composite feature, Mental_Score, to represent students' psychological conditions more holistically and deeply. In addition, SMOTE is applied to address the data imbalance issue, making the model more sensitive in detecting the at-risk student group as the minority class. Experimental results show that the proposed model achieves an accuracy of 97%. The application of SMOTE proved effective in increasing the minority class recall to 60% and raising the F1-score from 0.57 to 0.75, significantly strengthening the detection capability for the at-risk group. Although the McNemar test yields a p-value of 1.000 due to a ceiling effect since both models are already optimal, the proposed model still offers a practical advantage in maintaining detection sensitivity. Feature importance analysis confirms that Mental_Score is the most influential attribute with a contribution value of 0.3280. This study contributes to providing a more accurate machine learning-based framework for the early detection of student mental health.
References
World Health Organization, “Adolescent mental health.” Accessed: May 06, 2026. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/adolescent-mental-health
S. Roy, A. K. Biswas, and M. Sharma, “Multilevel mental health determinants among college students: A social ecological scoping review,” Ment. Health Prev., vol. 42, p. 200500, Jun. 2026, doi: 10.1016/J.MHP.2026.200500.
H. Hairani, T. Widiyaningtyas, and D. Dwi Prasetya, “Addressing Class Imbalance of Health Data: a Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies,” Sep. 2024. [Online]. Available: www.joiv.org/index.php/joiv
B. H. Aubaidan, R. A. Kadir, and M. T. Ijab, “A Comparative Analysis of Smote and CSSF Techniques for Diabetes Classification Using Imbalanced Data,” Journal of Computer Science, vol. 20, no. 9, pp. 1146–1165, 2024, doi: 10.3844/JCSSP.2024.1146.1165.
N. Nurdiansyah, F. S. Febriyan, Z. G. D. Amanta, D. A. Saputra, and W. M. Baihaqi, “Mental Health Analysis to Prevent Mental Disorders in Students Using The K-Nearest Neighbor (K-NN) Algorithm and Random Forest Algorithm,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 5, no. 1, pp. 1–10, Nov. 2025, doi: 10.57152/malcom.v5i1.1537.
D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach. Learn., vol. 113, no. 7, pp. 4903–4923, Jul. 2024, doi: 10.1007/s10994-022-06296-4.
P. Soltanzadeh and M. Hashemzadeh, “RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem,” Inf. Sci. (N. Y)., vol. 542, pp. 92–111, Jan. 2021, doi: 10.1016/J.INS.2020.07.014.
X. Yuan, S. Chen, H. Zhou, C. Sun, and L. Yuwen, “CHSMOTE: Convex hull-based synthetic minority oversampling technique for alleviating the class imbalance problem,” Inf. Sci. (N. Y)., vol. 623, pp. 324–341, Apr. 2023, doi: 10.1016/J.INS.2022.12.056.
A. Islam, S. B. Belhaouari, A. U. Rehman, and H. Bensmail, “KNNOR: An oversampling technique for imbalanced datasets,” Appl. Soft Comput., vol. 115, p. 108288, Jan. 2022, doi: 10.1016/J.ASOC.2021.108288.
Z. Xu, D. Shen, T. Nie, Y. Kou, N. Yin, and X. Han, “A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data,” Inf. Sci. (N. Y)., vol. 572, pp. 574–589, Sep. 2021, doi: 10.1016/J.INS.2021.02.056.
A. Arafa, N. El-Fishawy, M. Badawy, and M. Radad, “RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, pp. 5059–5074, Sep. 2022, doi: 10.1016/J.JKSUCI.2022.06.005.
Q. Dai, J. Liu, and J.-L. Zhao, “Distance-based arranging oversampling technique for imbalanced data,” Neural Comput. Appl., vol. 35, no. 2, pp. 1323–1342, 2023, doi: 10.1007/s00521-022-07828-8.
Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3413–3423, Jun. 2022, doi: 10.1016/J.JKSUCI.2021.01.014.
S. A. Alex, N. Z. Jhanjhi, M. Humayun, A. O. Ibrahim, and A. W. Abulfaraj, “Deep LSTM Model for Diabetes Prediction with Class Balancing by SMOTE,” Electronics (Switzerland), vol. 11, no. 17, Sep. 2022, doi: 10.3390/electronics11172737.
K. Roy et al., “An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/9953314.
X. Wang et al., “Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, Dec. 2021, doi: 10.1186/s12911-021-01471-4.
P. Muzumdar, G. Prasad Basyal, and P. Vyas, “An Empirical Comparison of Machine Learning Models for Student’s Mental Health Illness Assessment,” 2022. [Online]. Available: www.ajouronline.com
A. Ananda Hapsari, A. Syafei Nursuwanda, H. Zuhriyah, and D. Junesco Vresdian, “Klasifikasi Kesehatan Mental Mahasiswa Model TMAS dengan Algoritma Decision Tree, Logistic Regression, dan Random Forest,” vol. 7, 2024.
K. Rahayu, V. Fitria, D. Septhya, R. Rahmaddeni, and L. Efrizoni, “Klasifikasi Teks untuk Mendeteksi Depresi dan Kecemasan pada Pengguna Twitter Berbasis Machine Learning,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 2, pp. 108–114, Sep. 2023, doi: 10.57152/malcom.v3i2.780.
T. Solang and A. Nugroho, “ANALISIS KESEHATAN MENTAL MAHASISWA UNIVERSITAS KRISTEN SATYA WACANA MENGGUNAKAN METODE CLUSTERING ALGORITMA K-MEANS,” Jurnal TEKINKOM, vol. 6, no. 1, 2023, doi: 10.37600/tekinkom.v6i1.641.
A. Singh, K. Singh, A. Kumar, A. Shrivastava, and S. Kumar, “Machine Learning Algorithms for Detecting Mental Stress in College Students,” Dec. 2024, doi: 10.1109/I2CT61223.2024.10544243.
J. Jayadi, V. H. Cahaya Putra, A. R. Raharja, and M. Al-husaini, “DETEKSI DINI KESEHATAN MENTAL MAHASISWA DENGAN MACHINE LEARNING: PERBANDINGAN ALGORITMA DECISION TREE DAN RANDOM FOREST,” Technologia : Jurnal Ilmiah, vol. 17, no. 1, p. 134, Jan. 2026, doi: 10.31602/tji.v17i1.21251.
M. Ilham, A. Alfarobi, R. Romadona, T. Tariq, and A. Arum Sari, “Sistem Deteksi Dini Gangguan Mental Menggunakan Algoritma Random Forest,” Jurnal Riset Komputer), vol. 12, no. 4, pp. 2407–389, 2025, doi: 10.30865/jurikom.v12i4.8857.
Copyright (c) 2026 Rovidatul Hikmah Tanjung, Fera Damayanti, Ahmad Zaki

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).


.png)
.png)


