Exploring Various Data Mining Techniques to Predict Heart Disease

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

One of the main causes of fatalities in the global population is cardiovascular disease (CVD), commonly called heart disease. Early detection of CVD risks is a major area of interest in clinical data analysis. This study focuses on devising strategies for improving the predictive abilities of CVD risk detection algorithms. We experiment with binary and multiclass classification techniques on public UCI machine learning repository datasets, namely, Cleveland for training and Statlog and Hungarian for evaluation. The techniques include feature selection by best subset generation and data balancing using Binary and Multiclass SMOTE and their variants. Every technique is assessed by tenfold cross-validation on six classifiers: K-Nearest Neighbors (KNNs), Naive Bayes, Logistic Regression (LR), Support Vector Machine (SVM), Neural Network, and Vote (a hybrid technique combining Naïve Bayes and Logistic Regression). Experimental results show a rise in average classifier F1-score of 4.36% after feature selection and Binary SMOTE. Top-performing models include Logistic Regression, Neural Networks, and Voting. KNN shows a significant rise of 8.5 and 5.05% in accuracy, after employing Binary and Multiclass SMOTE techniques, respectively. Multiclass SMOTE results can be used as a benchmark but possess scope for further research and enhancement. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Description

Keywords

Cardiovascular disease (CVD), Data mining techniques, Ensemble classifiers, Exhaustive feature space search, Hybrid algorithms, Synthetic minority oversampling technique (SMOTE)

Citation

Lecture Notes in Networks and Systems, 2025, Vol.1265 LNNS, , p. 65-77

Endorsement

Review

Supplemented By

Referenced By