Exploring Various Data Mining Techniques to Predict Heart Disease
No Thumbnail Available
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media Deutschland GmbH
Abstract
One of the main causes of fatalities in the global population is cardiovascular disease (CVD), commonly called heart disease. Early detection of CVD risks is a major area of interest in clinical data analysis. This study focuses on devising strategies for improving the predictive abilities of CVD risk detection algorithms. We experiment with binary and multiclass classification techniques on public UCI machine learning repository datasets, namely, Cleveland for training and Statlog and Hungarian for evaluation. The techniques include feature selection by best subset generation and data balancing using Binary and Multiclass SMOTE and their variants. Every technique is assessed by tenfold cross-validation on six classifiers: K-Nearest Neighbors (KNNs), Naive Bayes, Logistic Regression (LR), Support Vector Machine (SVM), Neural Network, and Vote (a hybrid technique combining Naïve Bayes and Logistic Regression). Experimental results show a rise in average classifier F1-score of 4.36% after feature selection and Binary SMOTE. Top-performing models include Logistic Regression, Neural Networks, and Voting. KNN shows a significant rise of 8.5 and 5.05% in accuracy, after employing Binary and Multiclass SMOTE techniques, respectively. Multiclass SMOTE results can be used as a benchmark but possess scope for further research and enhancement. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Description
Keywords
Cardiovascular disease (CVD), Data mining techniques, Ensemble classifiers, Exhaustive feature space search, Hybrid algorithms, Synthetic minority oversampling technique (SMOTE)
Citation
Lecture Notes in Networks and Systems, 2025, Vol.1265 LNNS, , p. 65-77
