Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    Machine Learning Framework for Classification of COVID-19 Variants Using K-mer Based DNA Sequencing
    (John Wiley and Sons Inc, 2025) Kumar, S.; Raju, S.; Bhowmik, B.
    Accurate classification of viral DNA sequences is essential for tracking mutations, understanding viral evolution, and enabling timely public health responses. Traditional alignment-based methods are often computationally intensive and less effective for highly mutating viruses. This article presents a machine learning framework for classifying DNA sequences of COVID-19 variants using K-mer-based tokenization and vectorization techniques inspired by Natural Language Processing (NLP). DNA sequences corresponding to Alpha, Beta, Gamma, and Omicron variants are obtained from the Global Initiative on Sharing All Influenza Data (GISAID) database and encoded into feature vectors. Multiple classifiers, including Extra Trees, Random Forest, Support Vector Classifier (SVC), Decision Tree, Logistic Regression, Naive Bayes, K-Nearest Neighbor (KNN), Ridge Classifier, Stochastic Gradient Descent (SGD), and XGBoost, are evaluated based on accuracy, precision, recall, and F1-score. The Extra Trees model achieved the highest accuracy of 93.10% (Formula presented.) 0.42, followed by Random Forest with 92.60% (Formula presented.) 0.38, both demonstrating robust and balanced performance. Statistical significance tests confirmed the robustness of the results. The results validate the effectiveness of K-mer-based encoding combined with traditional machine learning models in classifying COVID-19 variants, offering a scalable and efficient solution for genomic surveillance. © 2025 Wiley Periodicals LLC.