Classification of multi-genomic data using MapReduce paradigm
No Thumbnail Available
Date
2015
Authors
Pahadia, M.
Srivastava, A.
Srivastava, D.
Patil, N.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Counting the number of occurences of a substring in a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. A k-mer is a k-length substring of a biological sequence. k-mer counting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. k-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. We provide a Hadoop based solution to solve the k-mer counting problem and then use this for classification of multi-genomic data. The classification is done using classifiers like Naive Bayes, Decision Tree and Support Vector Machine(SVM). Accuracy of more than 99% is observed. � 2015 IEEE.
Description
Keywords
Citation
International Conference on Computing, Communication and Automation, ICCCA 2015, 2015, Vol., , pp.678-682