Personality Classification on Social Media Using Lexicon Construction with Decision Tree Algorithm

Deliana Deliana, Muhammad Arhami, Musta’inul Abdi, Umri Erdiansyah

Sari


The development of information and communication technology has led to the emergence of various social media platforms that allow individuals to interact and express themselves. One important aspect that can be analyzed from data on social media is the user's personality. This research aims to classify the personality of social media users by using a lexicon-based approach optimized through the Decision Tree algorithm. The Decision Tree algorithm was chosen for its ability to partition data into smaller subsets based on specific features, resulting in an effective predictive model. This research uses data from the Kaggle platform and focuses on text analysis to identify personalities based on 16 personality types defined by the Myers-Briggs Type Indicator (MBTI). The classification process involves several stages, including data preprocessing, TF-IDF weighting, and application of the Decision Tree model. The results show that the lexicon-based approach implemented with the Decision Tree algorithm can produce high accuracy in social media user personality classification. This research contributes to the development of personality classification models that can be used in various applications such as marketing, workforce recruitment, and product development. However, this research also recognizes the limitations in the accuracy of the model which is influenced by the variability of language use in social media.

Teks Lengkap:

PDF (English)

Referensi


T. L. C. Yoong, N. R. Ngatirin, and Z. Zainol, “Personality prediction based on social media using decision tree algorithm,†Pertanika J. Sci. Technol., vol. 25, no. S4, pp. 237–248, 2017.

J. A. Sugihdharma and F. A. Bachtiar, “Myers-Briggs Type Indicator Personality Model Classification in English Text using Convolutional Neural Network Method,†vol. 2, pp. 93–103, 2022..

B. Liu, Web Data Mining. 2011.

M. Hatta, “Stemmer Bahasa Indonesia Dengan Pendekatan Aturan,†vol. 2, no. 7, pp. 1–11, 2022.

R. M. Yanti, I. Santoso, and L. H. Suadaa, “Application of Named Entity Recognition via Twitter on SpaCy in Indonesian ( Case Study : Power Failure in the Special Region of Yogyakarta ),†vol. 4, no. 1, pp. 76–86, 2021.

DQlab, “Begini Cara Implementasi Teknik Analisis Data untuk Text Preprocessing.†https://dqlab.id/begini-cara-implementasi-teknik-analisis-data-untuk-text-preprocessing

M. Liang, Data Mining: Concepts, Models, Methods, and Algorithms, vol. 36, no. 5. 2004..

A. Souri, S. Hosseinpour, and A. M. Rahmani, “Personality classification based on profiles of social networks’ users and the five-factor model of personality,†Human-centric Comput. Inf. Sci., vol. 8, no. 1, 2018, doi: 10.1186/s13673-018-0147-4.

M. H. Amirhosseini and H. Kazemian, “Machine Learning Approach to Personality Type Prediction Based on the Myers – Briggs Type Indicator ®,†2020.

J. Media and I. Budidarma, “Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random,†vol. 6, no. April, pp. 979–988, 2022, doi: 10.30865/mib.v6i2.3855.

S. Robertson and S. Robertson, “Understanding inverse document frequency : on theoretical arguments for IDF,†2006, doi: 10.1108/00220410410560582.


Refbacks

  • Saat ini tidak ada refbacks.


Indexing :

Creative Commons License
Journal of Informatics Engineering and Software Applications (JIEngS) licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.