Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/13810
Title: A novel technique of feature selection with relieff and CFS for protein sequence classification
Authors: Kaur K.
Patil N.
Issue Date: 2019
Citation: Advances in Intelligent Systems and Computing, 2019, Vol.707, pp.399-405
Abstract: Bioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods. © Springer Nature Singapore Pte Ltd. 2019
URI: 10.1007/978-981-10-8639-7_41
http://idr.nitk.ac.in/jspui/handle/123456789/13810
Appears in Collections:3. Book Chapters

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.