Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Information gain score computation for N-grams using multiprocessing model
    (Institute of Electrical and Electronics Engineers Inc., 2017) Shiva Darshan, S.L.S.; M.a, M.A.A.; Jaidhar, C.D.
    Currently, the Internet faces serious threat from malwares, and its propagation may cause great havoc on computers and network security solutions. Several existing anti-malware defensive solutions detect known malware accurately. However, they fail to recognize unseen malware, since most of them rely on signature-based techniques, which are easily evadable using obfuscation or polymorphism technique. Therefore, there is immediate requirement of new techniques that can detect and classify the new malwares. In this context, heuristic analysis is found to be promising, since it is capable of detecting unknown malwares and new variants of current malwares. The N-Gram extraction technique is one such heuristic method commonly used in malware detection. Previous works have witnessed that shorter length N-Grams are easier to extract. In order to identify and remove noisy N-Grams, a popular Feature Selection Technique (FST), namely, Information Gain (IG), which computes score for each N-Gram (feature) in the dataset has been used in this work. N-Grams with the highest IG score are considered as best features, while the remaining N-Grams are neglected. The IG-FST (Information Gain-Feature Selection Technique) is computational resource demanding and takes time to generate IG scores for larger N-Gram datasets, if the processing is to be accomplished in the sequential mode. To address this issue, the present work presents a multiprocessing model that computes IG scores rapidly for larger N-Gram datasets. The proposed model has been designed, implemented, and compared with the sequential mode of IG score computation. The experimental results demonstrate that the proposed multiprocessing model performance is 80% faster than the sequential model of IG score computation. © 2017 IEEE.
  • Item
    Comparative Analysis of Intrusion Detection System using ML and DL Techniques
    (Springer Science and Business Media Deutschland GmbH, 2023) Sunil, C.K.; Reddy, S.; Kanber, S.G.; Vuddanti, V.R.; Patil, N.
    Intrusion detection system (IDS) protects the network from suspicious and harmful activities. It scans the network for harmful activity and any potential breaching. Even in the presence of the so many network intrusion APIs there are still problems in detecting the intrusion. These problems can be handled through the normalization of whole dataset, and ranking of feature on benchmark dataset before training the classification models. In this paper, used NSL-KDD dataset for the analysation of various features and test the efficiency of the various algorithms. For each value of k, then, trained each model separately and evaluated the feature selection approach with the algorithms. This work, make use of feature selection techniques like Information gain, SelectKBest, Pearson coefficient and Random forest. And also iterate over the number of features to pick the best values in order to train the dataset.The selected features then tested on different machine and deep learning approach. This work make use of stacked ensemble learning technique for classification. This stacked ensemble learner contains model which makes un-correlated error there by making the model more robust. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
  • Item
    Navigating Data Imbalances in Credit Risk Management: A One-Sided Selection Approach
    (Institute of Electrical and Electronics Engineers Inc., 2024) Bennehalli, S.J.; Vakkund, S.; Anusha Hegde, H.; Bhowmik, B.
    Credit scoring plays a vital role in mitigating the information asymmetry that is pervasive on platforms for peer-to-peer (P2P) lending. A considerable challenge stems from the disparity in loan repayment outcomes: a significant minority of loan applicants defaulting on their loans, while the majority fulfilling their repayment obligations. The presence of imbalance in the dataset has the potential to incorporate bias into predictive model, which could lower its performance. In order to address this issue, data balancing techniques are often employed to enhance the performance of credit scoring models through the generation of datasets that are more balanced. This work constructs a robust credit scoring model capable of precisely assessing the creditworthiness of individuals seeking P2P lending. Four distinct classifiers - Logistic Regression, Random Forest, LightGBM, and Support Vector Machine (SVM) are employed. In doing so, it effectively mitigates the distortions that can result from unbalanced data distributions. This work achieves data balance with One-Sided Selection methodology along with Information gain and Pearson correlation which mainly determine the features to include. The proposed model thus works on both balanced and unbalanced datasets. Experimental results show that the standard metrics like accuracy, precision, recall, and F1-Score achieves upto 90.41%, 89.51%, 90.40%, and 89.96%, respectively. © 2024 IEEE.