Cyberbullying Detection Model Using IndoBERT Representation and Support Vector Machine

Alda Mauliza; Mahdi Mahdi; Mustaâ€™inul Abdi

Cyberbullying Detection Model Using IndoBERT Representation and Support Vector Machine

Alda Mauliza, Mahdi Mahdi, Mustaâ€™inul Abdi

Abstract

The increasing use of social media has led many individuals to remain unaware that their comments or reviews may be categorized as cyberbullying. Cyberbullying is a common form of harassment in digital spaces and can cause serious psychological impacts on victims, such as depression, sleep disturbances, and reduced work productivity. This study develops a machine learning model to classify text as bullying or non-bullying, utilizing a dataset from Hugging Face. The model is built through text analysis using IndoBERT representation and the Support Vector Machine classification method. Model performance evaluation was conducted using a Confusion Matrix, resulting in an accuracy rate of 89.59%. These results indicate that the combination of IndoBERT and Support Vector Machine is an effective approach for detecting cyberbullying.

Full Text:

PDF (Bahasa Indonesia)

References

A. Diannita, F. Salsabela, L. Wijiati, and A. M. S. Putri, â€œPengaruh Bullying terhadap Pelajar pada Tingkat Sekolah Menengah Pertama,â€ J. Educ. Res., vol. 4, no. 1, pp. 297â€“301, 2023, doi: 10.37985/jer.v4i1.117.

M. T. Hasan, M. A. E. Hossain, M. S. H. Mukta, A. Akter, M. Ahmed, and S. Islam, â€œA Review on Deep-Learning-Based Cyberbullying Detection,â€ Futur. Internet, vol. 15, no. 5, pp. 1â€“47, 2023, doi: 10.3390/fi15050179.

M. D. Ihkam and I. G. N. Parwata, â€œTindak Pidana Cyber Bullying Dalam Perspektif Huku Pidaa di Indonesia,â€ J. Kertha Wicara, vol. 9, no. 11, pp. 1â€“10, 2016.

S. Bansal, N. Garg, J. Singh, and F. Van Der Walt, â€œCyberbullying and mental health: past, present and future,â€ Front. Psychol., 2023, doi: 10.3389/fpsyg.2023.1279234.

W. A. Prabowo and F. Azizah, â€œSentiment Analysis for Detecting Cyberbullying Using TF-IDF and SVM,â€ J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 6, pp. 11â€“12, 2020, doi: 10.29207/resti.v4i6.2753.

F. Farasalsabila, E. Utami, and H. Hanafi, â€œDeteksi Cyberbullying Menggunakan BERT dan Bi-LSTM,â€ J. Teknol., vol. 17, no. 1, pp. 1â€“6, 2024, doi: 10.34151/jurtek.v17i1.4636.

J. Nalepa and M. Kawulok, â€œSelecting training sets for support vector machines: a review,â€ Artificial Intelligence Review. 2019. doi: 10.1007/s10462-017-9611-1.

J. Ipmawati, S. Saifulloh, and K. Kusnawi, â€œAnalisis Sentimen Tempat Wisata Berdasarkan Ulasan pada Google Maps Menggunakan Algoritma Support Vector Machine,â€ MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 1, pp. 247â€“256, 2024, doi: 10.57152/malcom.v4i1.1066.

S. D. Wahyuni and R. H. Kusumodestoni, â€œOptimalisasi Algoritma Support Vector Machine (SVM) Dalam Klasifikasi Kejadian Data Stunting,â€ Bull. Inf. Technol., vol. 5, no. 2, pp. 56â€“64, 2024, doi: 10.47065/bit.v5i2.1247.

R. Merdiansah, S. Siska, and A. Ali Ridha, â€œAnalisis Sentimen Pengguna X Indonesia Terkait Kendaraan Listrik Menggunakan IndoBERT,â€ J. Ilmu Komput. dan Sist. Inf., vol. 7, no. 1, pp. 221â€“228, 2024, doi: 10.55338/jikomsi.v7i1.2895.

B. Wilie et al., â€œIndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,â€ 2020, [Online]. Available: http://arxiv.org/abs/2009.05387

G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, â€œIndonesian multilabel classification using IndoBERT embedding and MBERT classification,â€ Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071â€“1078, 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.

U. R. Pol, â€œHugging Face: Revolutionizing AI and NLP,â€ Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 8, pp. 1121â€“1124, 2024, doi: 10.22214/ijraset.2024.64023.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, â€œA Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,â€ Frontiers in Energy Research. 2021. doi: 10.3389/fenrg.2021.652801.

A. Zahra, R. Mayasari, and I. Pernamasari, â€œAnalisis Sentimen pada Aplikasi M-Paspor Menggunakan Algoritma NaÃ¯ve Bayes Classifier,â€ Action Res. Lit., vol. 8, no. 8, pp. 2365â€“2371, 2024, doi: 10.46799/arl.v8i8.466.

Y. Wulandari, E. Haerani, S. K. Gusti, and S. Ramadhani, â€œKlasifikasi Berita Menggunakan Algoritma C4.5,â€ J. Nas. Komputasi dan Teknol. Inf., vol. 5, no. 2, pp. 279â€“289, 2022, doi: 10.32672/jnkti.v5i2.4194.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, â€œIndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,â€ in COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference, 2020. doi: 10.18653/v1/2020.coling-main.66.

A. Rogers, O. Kovaleva, and A. Rumshisky, â€œA primer in bertology: What we know about how bert works,â€ Trans. Assoc. Comput. Linguist., vol. 8, pp. 842â€“866, 2020, doi: 10.1162/tacl_a_00349.

S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, â€œComparative Evaluation of SVM Kernels for Sentiment Classification in Fuel Price Increase Analysis,â€ MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 2, pp. 153â€“160, 2023.

N. G. Ramadhan and A. Khoirunnisa, â€œKlasifikasi Data Malaria Menggunakan Metode Support Vector Machine,â€ J. Media Inform. Budidarma, vol. 5, no. 4, p. 1580, 2021, doi: 10.30865/mib.v5i4.3347.

G. Rininda, I. Hartami Santi, and S. Kirom, â€œPENERAPAN SVM DALAM ANALISIS SENTIMEN PADA EDLINK MENGGUNAKAN PENGUJIAN CONFUSION MATRIX,â€ JATI (Jurnal Mhs. Tek. Inform., 2024, doi: 10.36040/jati.v7i5.7420.

D. Krstinic, L. Seric, and I. Slapnicar, â€œComments on â€˜MLCM: Multi-Label Confusion Matrix,â€™â€ IEEE Access. 2023. doi: 10.1109/ACCESS.2023.3267672.

S. Sathyanarayanan, â€œConfusion Matrix-Based Performance Evaluation Metrics,â€ African J. Biomed. Res., vol. 27, no. 4, pp. 4023â€“4031, 2024, doi: 10.53555/ajbr.v27i4s.4345.

Refbacks

There are currently no refbacks.

Indexing :

Journal of Informatics Engineering and Software Applications (JIEngS) licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me