Classification of multi-genomic data using MapReduce paradigm

No Thumbnail Available

Date

2015

Authors

Pahadia, M.
Srivastava, A.
Srivastava, D.
Patil, N.

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Counting the number of occurences of a substring in a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. A k-mer is a k-length substring of a biological sequence. k-mer counting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. k-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. We provide a Hadoop based solution to solve the k-mer counting problem and then use this for classification of multi-genomic data. The classification is done using classifiers like Naive Bayes, Decision Tree and Support Vector Machine(SVM). Accuracy of more than 99% is observed. � 2015 IEEE.

Description

Keywords

Citation

International Conference on Computing, Communication and Automation, ICCCA 2015, 2015, Vol., , pp.678-682

Endorsement

Review

Supplemented By

Referenced By