Classification of multi-genomic data using MapReduce paradigm

Pahadia, M.; Srivastava, A.; Srivastava, D.; Patil, N.

Classification of multi-genomic data using MapReduce paradigm

Date

2015

Authors

Pahadia, M.

Srivastava, A.

Srivastava, D.

Patil, N.

Abstract

Counting the number of occurences of a substring in a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. A k-mer is a k-length substring of a biological sequence. k-mer counting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. k-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. We provide a Hadoop based solution to solve the k-mer counting problem and then use this for classification of multi-genomic data. The classification is done using classifiers like Naive Bayes, Decision Tree and Support Vector Machine(SVM). Accuracy of more than 99% is observed. � 2015 IEEE.

Citation

International Conference on Computing, Communication and Automation, ICCCA 2015, 2015, Vol., , pp.678-682

URI

https://idr.nitk.ac.in/handle/123456789/7957

Collections

2. Conference Papers

Full item page

Classification of multi-genomic data using MapReduce paradigm

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By