Conference Papers

Search Results

Now showing 1 - 2 of 2

A Comprehensive Analysis ofÂ Classification Techniques forÂ Effective Multi-class Research Article Categorization onÂ anÂ Imbalanced Dataset
(Springer Science and Business Media Deutschland GmbH, 2025) Gowhar, S.; Kempaiah, P.; Sowmya Kamath, S.; Sugumaran, V.
Categorizing scientific articles into specific research fields is a challenging problem, affected by the volume and variety of literature published. However, existing classification systems often suffer from limitations regarding taxonomy or the models used for classification. This article explores a comprehensive analysis of approaches built on Sentence Transformer embeddings combined with Machine Learning algorithms, Neural Networks, and Transformers to classify articles into 123 predefined classes, with the dataset being heavily imbalanced. The effectiveness of Large Language Models (LLMs) for generating synthetic data is also experimented with, along with synonym augmentation SMOTE and employing 1D CNNs for text classification. The best-performing model is a hierarchical classification model trained on MP-Net sentence embeddings that achieved an accuracy of 78%, outperforming all other models. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Imbalanced Multi-Class Research Article Classification using Sentence Transformers and Machine Learning Algorithms
(Association for Computing Machinery, Inc, 2025) Gowhar, S.; Kempaiah, P.; Kamath, S.S.; Sugumaran, V.
Categorizing scientific articles into specific research fields is a challenging problem, considering the volume and variety of published literature. However, existing classification systems often suffer from limitations regarding taxonomy or the models used for classification. This article explores approaches built on Sentence Transformer embeddings combined with Machine Learning algorithms to classify articles into 123 predefined classes, with the dataset being heavily imbalanced in nature. The effectiveness of Large Language Models (LLMs) for generating synthetic data is also experimented with, along with synonym augmentation and SMOTE. The best-performing model, the One vs Rest classifier trained on MP-Net sentence embeddings with SMOTE, achieved an accuracy of 77%, and outperformed all the other models. Â© 2024 Copyright held by the owner/author(s).

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results