A novel technique of feature selection with relieff and CFS for protein sequence classification

dc.contributor.authorKaur, K.
dc.contributor.authorPatil, N.
dc.date.accessioned2026-02-08T16:50:38Z
dc.date.issued2019
dc.description.abstractBioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods. © Springer Nature Singapore Pte Ltd. 2019
dc.identifier.citationAdvances in Intelligent Systems and Computing, 2019, Vol.707, , p. 399-405
dc.identifier.isbn9783319604855
dc.identifier.isbn9783319276427
dc.identifier.isbn9783319419343
dc.identifier.isbn9783319232034
dc.identifier.isbn9783319938844
dc.identifier.isbn9783642330414
dc.identifier.isbn9783319262833
dc.identifier.isbn9788132220084
dc.identifier.isbn9783642375019
dc.identifier.isbn9783030026820
dc.identifier.issn21945357
dc.identifier.urihttps://doi.org/10.1007/s11581-025-06438-3
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/33919
dc.publisherSpringer Verlag service@springer.de
dc.subjectBioinformatics
dc.subjectCFS
dc.subjectClassification
dc.subjectFeature selection
dc.subjectFilter
dc.subjectGene data
dc.subjectProtein sequence data
dc.subjectReliefF
dc.titleA novel technique of feature selection with relieff and CFS for protein sequence classification

Files

Collections