Generating Synthetic Text Data for Improving Class Balance in Personality Prediction
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
The growing popularity of social media as a means of self-expression and self-discovery has sparked a heightened curiosity in utilizing the Myers–Briggs Type Indicator (MBTI) to investigate human personalities. Despite the increasing use of word-embedding techniques, machine learning algorithms, and imbalanced data-handling techniques to predict MBTI personality types, further research is needed to explore how these approaches can enhance the accuracy of the results. Our research aimed to use the GPT model to address the problem of class imbalance. We have implemented several machine learning models such as RCNN, LSTM, XGBoost, and Random Forest. We have also tried using two-word embedding including Word2Vec and GloVe Embedding. According to our findings, the approach we used can attain a considerably high F1-score, which is dependent on the selected model for the prediction and classification of MBTI personality. The ability to accurately predict and classify MBTI personality through our approach has the potential to improve our comprehension of MBTI. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Description
Keywords
GPT, MBTI, Oversampling, Predictive models, Text generation
Citation
Signals and Communication Technology, 2024, Vol.Part F2556, , p. 59-70
