Application of the K-Nearest Neighbors (KNN) Algorithm for Diabetes Mellitus Classification: Evidence from Aceh Province, Indonesia
Abstract
Diabetes mellitus is a non-communicable disease with a steadily increasing prevalence in Indonesia, including Aceh Province. Early detection using data-driven approaches is essential to minimize the risk of severe complications. This study aims to classify diabetes mellitus by implementing the K-Nearest Neighbors (KNN) algorithm. The dataset comprises 1,500 instances from the Pima Indians Diabetes Dataset obtained from Kaggle and an additional 100 instances collected from hospitals across Aceh Province. Data preprocessing involved normalization and label encoding, followed by data partitioning into training and testing sets using a 90:10 ratio. The KNN model was configured with a parameter value of K=5. Experimental results indicate that the proposed model achieved an accuracy of 85%, precision of 87%, recall of 82%, and an F1-score of 85% on the Kaggle dataset. For the hospital dataset, the model attained an accuracy of 76%, precision of 80.95%, recall of 68%, and an F1-score of 73.91%. These findings suggest that the KNN algorithm demonstrates adequate performance in classifying diabetes mellitus and may serve as a basis for the development of data-driven medical decision support systems.
Full Text:
PDFReferences
. World Health Organization, Global Report on Diabetes, Geneva, Switzerland: WHO Press, 2016.
. World Health Organization, “Diabetes,†WHO Fact Sheets, Geneva, Switzerland, 2023.
. International Diabetes Federation, IDF Diabetes Atlas, 10th ed., Brussels, Belgium: IDF, 2022.
. Kementerian Kesehatan Republik Indonesia, Profil Kesehatan Indonesia Tahun 2022, Jakarta, Indonesia: Kemenkes RI, 2022.
. Dinas Kesehatan Aceh, Profil Kesehatan Provinsi Aceh Tahun 2023, Banda Aceh, Indonesia: Dinkes Aceh, 2023.
. American Diabetes Association, “Classification and diagnosis of diabetes,†Diabetes Care, vol. 46, no. Supplement 1, pp. S19–S40, Jan. 2023, doi: 10.2337/dc23-S002.
. J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,†Proceedings of the Annual Symposium on Computer Application in Medical Care, Washington DC, USA, pp. 261–265, 1988.
. T. M. Mitchell, Machine Learning, New York, NY, USA: McGraw-Hill Education, 1997.
. T. Cover and P. Hart, “Nearest neighbor pattern classification,†IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, doi: 10.1109/TIT.1967.1053964.
. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., San Francisco, CA, USA: Morgan Kaufmann, 2012.
. K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,†Computational and Structural Biotechnology Journal, vol. 13, pp. 8–17, 2015, doi: 10.1016/j.csbj.2014.11.005.
. Esteva et al., “A guide to deep learning in healthcare,†Nature Medicine, vol. 25, no. 1, pp. 24–29, Jan. 2019, doi: 10.1038/s41591-018-0316-z.
. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,†Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.
. M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,†Information Processing & Management, vol. 45, no. 4, pp. 427–437, Jul. 2009, doi: 10.1016/j.ipm.2009.03.002.
. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Boston, MA, USA: Springer, 1998.
. J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,†Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA, USA, pp. 194–202, 1995.
. S. Shankar, S. K. Lakshmanaprabu, S. K. S. Raj, and A. Maseleno, “Optimal feature selection-based diabetic retinopathy classification using KNN,†International Journal of Engineering and Advanced Technology, vol. 9, no. 1, pp. 1656–1662, Oct. 2019.
. Kaggle, “Pima Indians Diabetes Database,†Kaggle Datasets, 2024. [Online]. Available: https://www.kaggle.com
. Alpaydin, Introduction to Machine Learning, 3rd ed., Cambridge, MA, USA: MIT Press, 2014.
. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed., Burlington, MA, USA: Morgan Kaufmann, 2016.
. H. Shortliffe and J. J. Cimino, Biomedical Informatics: Computer Applications in Health Care and Biomedicine, 4th ed., London, UK: Springer, 2014.
. Z. Obermeyer and E. J. Emanuel, “Predicting the future — Big data, machine learning, and clinical medicine,†The New England Journal of Medicine, vol. 375, no. 13, pp. 1216–1219, Sep. 2016, doi: 10.1056/NEJMp1606181.
. J. Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again, New York, NY, USA: Basic Books, 2019.
. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,†Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Cambridge, MA, USA: MIT Press, 2016.
DOI: http://dx.doi.org/10.30811/jtrik.v9i1.8721
Refbacks
- There are currently no refbacks.
Jurnal Teknologi Rekayasa Informasi dan Komputer - Politeknik Negeri Lhokseumawe
This work is licensed under CC BY-SA 4.0
©2021 All rights reserved | P-ISSN: 2581-2882| E-ISSN 2797-1724




