Cyberbullying Detection Model Using IndoBERT Representation and Support Vector Machine

Alda Mauliza, Mahdi Mahdi, Musta’inul Abdi

Abstract


The increasing use of social media has led many individuals to remain unaware that their comments or reviews may be categorized as cyberbullying. Cyberbullying is a common form of harassment in digital spaces and can cause serious psychological impacts on victims, such as depression, sleep disturbances, and reduced work productivity. This study develops a machine learning model to classify text as bullying or non-bullying, utilizing a dataset from Hugging Face. The model is built through text analysis using IndoBERT representation and the Support Vector Machine classification method. Model performance evaluation was conducted using a Confusion Matrix, resulting in an accuracy rate of 89.59%. These results indicate that the combination of IndoBERT and Support Vector Machine is an effective approach for detecting cyberbullying.

References


A. Diannita, F. Salsabela, L. Wijiati, and A. M. S. Putri, “Pengaruh Bullying terhadap Pelajar pada Tingkat Sekolah Menengah Pertama,†J. Educ. Res., vol. 4, no. 1, pp. 297–301, 2023, doi: 10.37985/jer.v4i1.117.

M. T. Hasan, M. A. E. Hossain, M. S. H. Mukta, A. Akter, M. Ahmed, and S. Islam, “A Review on Deep-Learning-Based Cyberbullying Detection,†Futur. Internet, vol. 15, no. 5, pp. 1–47, 2023, doi: 10.3390/fi15050179.

M. D. Ihkam and I. G. N. Parwata, “Tindak Pidana Cyber Bullying Dalam Perspektif Huku Pidaa di Indonesia,†J. Kertha Wicara, vol. 9, no. 11, pp. 1–10, 2016.

S. Bansal, N. Garg, J. Singh, and F. Van Der Walt, “Cyberbullying and mental health: past, present and future,†Front. Psychol., 2023, doi: 10.3389/fpsyg.2023.1279234.

W. A. Prabowo and F. Azizah, “Sentiment Analysis for Detecting Cyberbullying Using TF-IDF and SVM,†J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 6, pp. 11–12, 2020, doi: 10.29207/resti.v4i6.2753.

F. Farasalsabila, E. Utami, and H. Hanafi, “Deteksi Cyberbullying Menggunakan BERT dan Bi-LSTM,†J. Teknol., vol. 17, no. 1, pp. 1–6, 2024, doi: 10.34151/jurtek.v17i1.4636.

J. Nalepa and M. Kawulok, “Selecting training sets for support vector machines: a review,†Artificial Intelligence Review. 2019. doi: 10.1007/s10462-017-9611-1.

J. Ipmawati, S. Saifulloh, and K. Kusnawi, “Analisis Sentimen Tempat Wisata Berdasarkan Ulasan pada Google Maps Menggunakan Algoritma Support Vector Machine,†MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 1, pp. 247–256, 2024, doi: 10.57152/malcom.v4i1.1066.

S. D. Wahyuni and R. H. Kusumodestoni, “Optimalisasi Algoritma Support Vector Machine (SVM) Dalam Klasifikasi Kejadian Data Stunting,†Bull. Inf. Technol., vol. 5, no. 2, pp. 56–64, 2024, doi: 10.47065/bit.v5i2.1247.

R. Merdiansah, S. Siska, and A. Ali Ridha, “Analisis Sentimen Pengguna X Indonesia Terkait Kendaraan Listrik Menggunakan IndoBERT,†J. Ilmu Komput. dan Sist. Inf., vol. 7, no. 1, pp. 221–228, 2024, doi: 10.55338/jikomsi.v7i1.2895.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,†2020, [Online]. Available: http://arxiv.org/abs/2009.05387

G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,†Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.

U. R. Pol, “Hugging Face: Revolutionizing AI and NLP,†Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 8, pp. 1121–1124, 2024, doi: 10.22214/ijraset.2024.64023.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,†Frontiers in Energy Research. 2021. doi: 10.3389/fenrg.2021.652801.

A. Zahra, R. Mayasari, and I. Pernamasari, “Analisis Sentimen pada Aplikasi M-Paspor Menggunakan Algoritma Naïve Bayes Classifier,†Action Res. Lit., vol. 8, no. 8, pp. 2365–2371, 2024, doi: 10.46799/arl.v8i8.466.

Y. Wulandari, E. Haerani, S. K. Gusti, and S. Ramadhani, “Klasifikasi Berita Menggunakan Algoritma C4.5,†J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 2, pp. 279–289, 2022, doi: 10.32672/jnkti.v5i2.4194.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,†in COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference, 2020. doi: 10.18653/v1/2020.coling-main.66.

A. Rogers, O. Kovaleva, and A. Rumshisky, “A primer in bertology: What we know about how bert works,†Trans. Assoc. Comput. Linguist., vol. 8, pp. 842–866, 2020, doi: 10.1162/tacl_a_00349.

S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Comparative Evaluation of SVM Kernels for Sentiment Classification in Fuel Price Increase Analysis,†MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 2, pp. 153–160, 2023.

N. G. Ramadhan and A. Khoirunnisa, “Klasifikasi Data Malaria Menggunakan Metode Support Vector Machine,†J. Media Inform. Budidarma, vol. 5, no. 4, p. 1580, 2021, doi: 10.30865/mib.v5i4.3347.

G. Rininda, I. Hartami Santi, and S. Kirom, “PENERAPAN SVM DALAM ANALISIS SENTIMEN PADA EDLINK MENGGUNAKAN PENGUJIAN CONFUSION MATRIX,†JATI (Jurnal Mhs. Tek. Inform., 2024, doi: 10.36040/jati.v7i5.7420.

D. Krstinic, L. Seric, and I. Slapnicar, “Comments on ‘MLCM: Multi-Label Confusion Matrix,’†IEEE Access. 2023. doi: 10.1109/ACCESS.2023.3267672.

S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,†African J. Biomed. Res., vol. 27, no. 4, pp. 4023–4031, 2024, doi: 10.53555/ajbr.v27i4s.4345.


Refbacks

  • There are currently no refbacks.


Indexing :

Creative Commons License
Journal of Informatics Engineering and Software Applications (JIEngS) licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.