Classification of multi-genomic data using MapReduce paradigm
| dc.contributor.author | Pahadia, M. | |
| dc.contributor.author | Srivastava, A. | |
| dc.contributor.author | Srivastava, D. | |
| dc.contributor.author | Patil, N. | |
| dc.date.accessioned | 2026-02-06T06:39:32Z | |
| dc.date.issued | 2015 | |
| dc.description.abstract | Counting the number of occurences of a substring in a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. A k-mer is a k-length substring of a biological sequence. k-mer counting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. k-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. We provide a Hadoop based solution to solve the k-mer counting problem and then use this for classification of multi-genomic data. The classification is done using classifiers like Naive Bayes, Decision Tree and Support Vector Machine(SVM). Accuracy of more than 99% is observed. © 2015 IEEE. | |
| dc.identifier.citation | International Conference on Computing, Communication and Automation, ICCCA 2015, 2015, Vol., , p. 678-682 | |
| dc.identifier.uri | https://doi.org/10.1109/CCAA.2015.7148460 | |
| dc.identifier.uri | https://idr.nitk.ac.in/handle/123456789/32355 | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.title | Classification of multi-genomic data using MapReduce paradigm |
