Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 11
  • Item
    A novel technique of feature selection with relieff and CFS for protein sequence classification
    (Springer Verlag service@springer.de, 2019) Kaur, K.; Patil, N.
    Bioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods. © Springer Nature Singapore Pte Ltd. 2019
  • Item
    Novel hybrid feature selection models for unsupervised document categorization
    (Institute of Electrical and Electronics Engineers Inc., 2017) Bhopale, A.P.; Kamath S․, S.
    Dealing with high dimensional data is a challenging and computationally complex task in the data pre-processing phase of text clustering. Conventionally, union and intersection approaches have been used to combine results of different feature selection methods to optimize relevant feature space for document collection. Union method selects all features from considered sub-models, whereas, intersection method selects only common features identified by sub-models. However, in reality, any type of feature selection can cause a loss of some potentially important features. In this paper, a hybrid feature selection model called Modified Hybrid Union (MHU) is proposed, which selects features by considering the individual strengths and weaknesses of each constituent component of the model. A comparative evaluation of its performance for K-means clustering and Bio-inspired Flockbased clustering is also presented on standard data sets such as OWL-S TC and Reuters-21578. © 2017 IEEE.
  • Item
    Feature selection using fast ensemble learning for network intrusion detection
    (Springer Verlag service@springer.de, 2020) Pasupulety, U.; Adwaith, C.D.; Hegde, S.; Patil, N.
    Network security plays a critical role in today’s digital system infrastructure. Everyday, there are hundreds of cases of data theft or loss due to the system’s integrity being compromised. The root cause of this issue is the lack of systems in place which are able to foresee the advent of such attacks. Network Intrusion detection techniques are important to prevent any system or network from malicious behavior. By analyzing a dataset with features summarizing the method in which connections are made to the network, any attempt to access it can be classified as malicious or benign. To improve the accuracy of network intrusion detection, various machine learning algorithms and optimization techniques are used. Feature selection helps in finding important attributes in the dataset which have a significant effect on the final classification. This results in the reduction of the size of the dataset, thereby simplifying the task of classification. In this work, we propose using multiple techniques as an ensemble for feature selection. To reduce training time and retain accuracy, the important features of a subset of the KDD Network Intrusion detection dataset were analyzed using this ensemble learning technique. Out of 41 possible features for network intrusion, it was found that host-based statistical features of network flow play an import role in predicting network intrusion. Our proposed methodology provides multiple levels of overall selected features, correlated to the number of individual feature selection techniques that selected them. At the highest level of selected features, our experiments yielded a 6% increase in intrusion detection accuracy, an 81% decrease in dataset size and a 5.4× decrease in runtime using a Multinomial Naive Bayes classifier on the original dataset. © Springer Nature Switzerland AG 2020.
  • Item
    A single program multiple data algorithm for feature selection
    (Springer Verlag service@springer.de, 2020) Chanduka, B.; Gangavarapu, T.; Jaidhar, C.D.
    Feature selection is a critical component in data science and has been the topic of research for many years. Advances in hardware and the availability of better multiprocessing platforms have enabled parallel computing to reach very high levels of performance. Minimum Redundancy Maximum Relevance (mRMR) is a powerful feature selection technique used in many applications. In this paper, we present a novel optimized Single Program Multiple Data (SPMD) approach to implement the mRMR algorithm with synchronous computation, optimum load balancing and greater speedup than task-parallel approaches. The experimental results presented using multiple synthesized datasets prove the efficiency and scalability of the proposed technique over original mRMR. © Springer Nature Switzerland AG 2020.
  • Item
    A novel bio-inspired hybrid metaheuristic for unsolicited bulk email detection
    (Springer Science and Business Media Deutschland GmbH, 2020) Gangavarapu, T.; Jaidhar, C.D.
    With the recent influx of technology, Unsolicited Bulk Emails (UBEs) have become a potential problem, leaving computer users and organizations at the risk of brand, data, and financial loss. In this paper, we present a novel bio-inspired hybrid parallel optimization algorithm (Cuckoo-Firefly-GR), which combines Genetic Replacement (GR) of low fitness individuals with a hybrid of Cuckoo Search (CS) and Firefly (FA) optimizations. Cuckoo-Firefly-GR not only employs the random walk in CS, but also uses mechanisms in FA to generate and select fitter individuals. The content- and behavior-based features of emails used in the existing works, along with Doc2Vec features of the email body are employed to extract the syntactic and semantic information in the emails. By establishing an optimal balance between intensification and diversification, and reaching global optimization using two metaheuristics, we argue that the proposed algorithm significantly improves the performance of UBE detection, by selecting the most discriminative feature subspace. This study presents significant observations from the extensive evaluations on UBE corpora of 3, 844 emails, that underline the efficiency and superiority of our proposed Cuckoo-Firefly-GR over the base optimizations (Cuckoo-GR and Firefly-GR), dense autoencoders, recurrent neural autoencoders, and several state-of-the-art methods. Furthermore, the instructive feature subset obtained using the proposed Cuckoo-Firefly-GR, when classified using a dense neural model, achieved an accuracy of $$99\%$$. © Springer Nature Switzerland AG 2020.
  • Item
    Fake News Detection Using Genetic Algorithm-Based Feature Selection and Ensemble Learning
    (Springer Science and Business Media Deutschland GmbH, 2022) Nikitha, K.M.; Rozario, R.; Pradeep, C.; Ananthanarayana, V.S.
    Since its conception roughly 40 years ago, the Internet has always been an unpoliced area of human interaction. This lawlessness has since been curbed with legislation, making nefarious activities on the web constitutionally punishable. However, in the case of fake news and disinformation campaigns, the responsibility of verification is placed on the reader and the publisher, and there is no easily executable legal recourse for wrongdoers. This lack of policing combined with the power of controlling popular opinion for uses such as election manipulation, slander as a form of blackmail, stock manipulation for insider trading, shielding corporate wrong-doing makes it clear that this is a problem worth solving. Furthermore, we believe that automating the process is crucial as the task requires processing a massive amount of information whilst also being free of all biases, which is not possible by a human team. This paper explores different text properties that can indicate if a newspaper article is likely to be false or real. Our novel approach makes use of an ensemble learner created using weak learners. The weak learners are further trained on selective features to make them moderate learners. Our study shows that training individual models on different sets of features extracted using genetic algorithms performs better than models trained on all features. These become moderate learners and surpass the weak learners on performance. Further, when we ensemble these moderate learners, we achieve superior results than normal ensemble learners. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Detection of Cardiac Arrhythmia Using Machine Learning Approaches
    (Institute of Electrical and Electronics Engineers Inc., 2022) Chittoria, J.; Kamath S․, S.; Mayya, V.
    Arrhythmia is a cardiovascular disease that alters the heart rate, resulting in too fast, too slow, or irregular rhythms. It is a life-threatening disease if left untreated. Traditionally, arrhythmia is diagnosed by a trained doctor, using an electrocardiogram to analyze irregular heartbeats. However, these methods are vulnerable to inadvertent misdiagnosis, especially during the early stages of the disease. In this paper, an approach for cardiac arrhythmia detection is presented, where the subjects or instances are first categorized as diseased or normal and then further graded into normal (non-diseased) or as distinct subtypes of cardiac arrhythmia. The dataset was obtained from the UCI Machine Learning Data Repository, and machine learning methods such as XGBoost, CatBoost, SVM, and Random Forest, were experimented with. Addition-ally, the mutual information-based feature selection approach, minimal redundancy maximum relevance (mRMR), is proposed to improve classification accuracy. Standard evaluation metrics such as accuracy, f1-score, precision, and recall are utilized for comparison of the obtained results. The experimental results demonstrated that accuracy of 81.48% was achieved for multi-class classification, while binary classification achieved up to 84% accuracy. © 2022 IEEE.
  • Item
    Exploring the Impact of External Factors on Ride-Hailing Demand: A Predictive Modelling Approach
    (The Society for the Study of Artificial Intelligence and Simulation of Behaviour, 2023) Sriram, A.; Ananthanarayana, V.S.
    This paper presents a comprehensive study on the usage of Uber in different markets, with a focus on understanding the impact of demographic factors, public transit proximity, weather and extreme events on the demand for Uber ride-hailing services. This study involves application of Explainable AI techniques for feature selection among multiple data sources to model external factors on the Uber ride usage. Furthermore, factors such as weather and local events are used for ride usage forecasting using spatiotemporal aspects and extreme event analysis. The results of this study showed that certain factors like demography, proximity of public transit play a role in shaping the usage patterns of Uber. Also, extreme events, such as weather conditions and local events, were found to have a significant impact on the demand for Uber services. This study provides valuable insights for Uber, similar ride-hailing services and policymakers for optimal resource allocation, and lays the foundation for further research on the relationship between transportation services and various contextual factors. © AISB Convention 2023.All rights reserved.
  • Item
    Fault diagnosis of bearings through vibration signal using Bayes classifiers
    (Inderscience Publishers, 2014) Kumar, H.; Ranjit Kumar, T.A.; Amarnath, M.; Sugumaran, V.
    Bearings are an inevitable part in industrial machineries, which is subjected to wear and tear. Breakdown of such crucial components incur heavy losses. This study concerns with fault diagnosis through machine learning approach of bearing using vibration signals of bearings in good and simulated faulty conditions. The vibration data was acquired from bearings using accelerometer under different operating conditions. Vibration signals of a bearing contain the dynamic information about its operating condition. The descriptive statistical features were extracted from vibration signals and the important ones were selected using decision tree (dimensionality reduction). The decision tree has been formulated using J48 algorithm. The selected features were then used for classification using Bayes classifiers namely, Naïve Bayes and Bayes net. The paper also discusses the effect of various parameters on classification accuracy. © 2014 Inderscience Enterprises Ltd.
  • Item
    Fault diagnosis of bearings through sound signal using statistical features and bayes classifier
    (Krishtel eMaging Solutions Pvt. Ltd, 2016) Kumar, H.; Sugumaran, V.; Amarnath, M.
    Bearing is one of important rotary elements used in almost all machinery. This study concerns with fault diagnosis through machine learning approach using acoustic signals (sound) of bearings in good and simulated faulty conditions. The acoustic data was acquired from near field area of bearings using microphone under different operating conditions. Acoustic signals of a bearing contain the dynamic information about its operating condition. Abundant literature reported suitability of vibration signals for fault diagnosis applications, however, not much using sound signals for diagnosis applications. Also, transducers used for measurement of sound are less costly than transducers used for vibration measurement. Hence, usage of sound signals for fault diagnosis applications of bearings found beneficial. The descriptive statistical features were extracted from sound signals and the important ones were selected using decision tree (dimensionality reduction). The selected features were then used for classification using Bayes classifier. The paper also discusses the effect of various parameters on classification accuracy. © KRISHTEL eMAGING SOLUTIONS PVT. LTD.