A novel semi-supervised approach for protein sequence classification

No Thumbnail Available

Date

2015

Authors

Chaturvedi, B.
Patil, N.

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Bioinformatics is an emerging research area. Classification of protein sequence dataset is the biggest challenge for researcher. This paper deals with supervised and semi-supervised classification of human protein sequence. Amino acid composition (AAC) used for feature extraction of the protein sequence. The classification techniques like Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbour (KNN), Random Forest, Decision Tree are using for classification of protein sequence dataset. Amongst these classifiers SVM reported the best result with higher accuracy. The limitation with SVM is that it works only with supervised(labeled dataset). It doesn't work with unsupervised or semi-supervised dataset (unlabeled dataset or large amount of unlabeled dataset among small amount of labeled dataset). A novel semi-supervised support vector machine (SSVM) classifier is proposed which works with combination of labled and unlabled dataset. In results it observed that the proposed approach gives higher accuracy with semi-supervised dataset. Principal component analysis (PCA) used for feature reduction of protein sequence. The proposed semi-supervised support vector machine (SSVM) using PCA gives increased accuracy of about 5 to 10%. � 2015 IEEE.

Description

Keywords

Citation

Souvenir of the 2015 IEEE International Advance Computing Conference, IACC 2015, 2015, Vol., , pp.1158-1162

Endorsement

Review

Supplemented By

Referenced By