Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
3 results
Search Results
Item A novel technique of feature selection with relieff and CFS for protein sequence classification(Springer Verlag service@springer.de, 2019) Kaur, K.; Patil, N.Bioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods. © Springer Nature Singapore Pte Ltd. 2019Item Feature selection using fast ensemble learning for network intrusion detection(Springer Verlag service@springer.de, 2020) Pasupulety, U.; Adwaith, C.D.; Hegde, S.; Patil, N.Network security plays a critical role in today’s digital system infrastructure. Everyday, there are hundreds of cases of data theft or loss due to the system’s integrity being compromised. The root cause of this issue is the lack of systems in place which are able to foresee the advent of such attacks. Network Intrusion detection techniques are important to prevent any system or network from malicious behavior. By analyzing a dataset with features summarizing the method in which connections are made to the network, any attempt to access it can be classified as malicious or benign. To improve the accuracy of network intrusion detection, various machine learning algorithms and optimization techniques are used. Feature selection helps in finding important attributes in the dataset which have a significant effect on the final classification. This results in the reduction of the size of the dataset, thereby simplifying the task of classification. In this work, we propose using multiple techniques as an ensemble for feature selection. To reduce training time and retain accuracy, the important features of a subset of the KDD Network Intrusion detection dataset were analyzed using this ensemble learning technique. Out of 41 possible features for network intrusion, it was found that host-based statistical features of network flow play an import role in predicting network intrusion. Our proposed methodology provides multiple levels of overall selected features, correlated to the number of individual feature selection techniques that selected them. At the highest level of selected features, our experiments yielded a 6% increase in intrusion detection accuracy, an 81% decrease in dataset size and a 5.4× decrease in runtime using a Multinomial Naive Bayes classifier on the original dataset. © Springer Nature Switzerland AG 2020.Item A fast and novel approach based on grouping and weighted mRMR for feature selection and classification of protein sequence data(Inderscience Publishers, 2020) Kaur, K.; Patil, N.The analysis of protein sequences under bioinformatics has gained wide importance in research area. Newly added protein sequences can be analysed using existing proteins and converting them into feature vector form. However, it emerges as a challenging task to deal with huge number of features obtained using sequence encoding techniques. Since all the features obtained are not actually required, a three-stage feature selection approach has been proposed. In the first stage, features are ranked and most irrelevant features are removed; in the second stage, conflicting features are grouped together; and in third stage, a fast approach based on weighted Minimum Redundancy Maximum Relevance (wMRMR) has been proposed and applied on grouped features. Different classification methods are used to analyse the performance of the proposed approach. It is observed that the proposed approach has increased classification accuracy results and reduced time consumption in comparison to the state-of-the-art methods. © 2020 Inderscience Enterprises Ltd.
