Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
4 results
Search Results
Item Comparative study of Principal Component Analysis based Intrusion Detection approach using machine learning algorithms(Institute of Electrical and Electronics Engineers Inc., 2015) Chabathula, K.J.; Jaidhar, C.D.; M.a, M.A.This paper induces the prominence of variegated machine learning techniques adapted so far for the identifying different network attacks and suggests a preferable Intrusion Detection System (IDS) with the available system resources while optimizing the speed and accuracy. With booming number of intruders and hackers in todays vast and sophisticated computerized world, it is unceasingly challenging to identify unknown attacks in promising time with no false positive and no false negative. Principal Component Analysis (PCA) curtails the amount of data to be compared by reducing their dimensions prior to classification that results in reduction of detection time. In this paper, PCA is adopted to reduce higher dimension dataset to lower dimension dataset. It is accomplished by converting network packet header fields into a vector then PCA applied over high dimensional dataset to reduce the dimension. The reduced dimension dataset is tested with Support Vector Machines (SVM), K-Nearest Neighbors (KNN), J48 Tree algorithm, Random Forest Tree classification algorithm, Adaboost algorihm, Nearest Neighbors generalized Exemplars algorithm, Navebayes probabilistic classifier and Voting Features Interval classification algorithm. Obtained results demonstrates detection accuracy, computational efficiency with minimal false alarms, less system resources utilization. Experimental results are compared with respect to detection rate and detection time and found that TREE classification algorithms achieved superior results over other algorithms. The whole experiment is conducted by using KDD99 data set. © 2015 IEEE.Item Evaluation of Machine Learning Frameworks on Bank Marketing and Higgs Datasets(Institute of Electrical and Electronics Engineers Inc., 2015) Bhuvan, B.M.; Jain, S.; Rao, V.D.; Patil, N.; Raghavendra, G.S.Big data is an emerging field with different datasets of various sizes are being analyzed for potential applications. In parallel, many frameworks are being introduced where these datasets can be fed into machine learning algorithms. Though some experiments have been done to compare different machine learning algorithms on different data, these experiments have not been tested out on different platforms. Our research aims to compare two selected machine learning algorithms on data sets of different sizes deployed on different platforms like Weka, Scikit-Learn and Apache Spark. They are evaluated based on Training time, Accuracy and Root mean squared error. This comparison helps us to decide what platform is best suited to work while applying computationally expensive selected machine learning algorithms on a particular size of data. Experiments suggested that Scikit-Learn would be optimal on data which can fit into memory. While working with huge, data Apache Spark would be optimal as it performs parallel computations by distributing the data over a cluster. Hence this study concludes that spark platform which has growing support for parallel implementation of machine learning algorithms could be optimal to analyze big data. © 2015 IEEE.Item Anomaly Detection in Electric Powertrain System Software Behaviour(Institute of Electrical and Electronics Engineers Inc., 2023) Vyas, A.; Ghorpade, V.; Kamble, S.; Johnson, P.S.; Kamath, A.; Rawat, K.A software-in-loop (SIL) testing is a method of early testing of control software of a car in virtual environment. A system level testing is carried out on regular basis and it is important to see, if system is behaving as expected or unexpected. For unexpected behaviors, which test engineers not easily notice, modern techniques such as machine learning can give an advantage. This paper presents an application of machine learning algorithms that helps in identifying the abnormal patterns in time series data generated from electric powertrain system testing done in SIL environment for a Mercedes Benz Electric Car. Output of the SIL testing, results in time series data that is a collection of observations that are ordered chronologically and can be used to analyze trends, patterns, and changes over time. Anomaly detection in time series data is a process in machine learning that identifies data points, events, and observations that deviate from a dataset's normal behavior. By monitoring the expected and unexpected behavior of the electric powertrain system, anomaly detection can be a valuable tool for identifying potential issues. This study aims at coming up with an efficient process for anomaly detection in SIL. In order to get this process, various anomaly detection techniques are compared to detect a defined anomaly in time series data. Data pre-processing methods are also discussed before training the model. At the end, we conclude a best-fit method for identified anomaly. With finally identified method, a model was trained and used further in application. © 2023 IEEE.Item Feature Elimination and Comparative Assessment of Machine Learning Algorithms for Flood Susceptibility Mapping in Kerala, India(Institute of Electrical and Electronics Engineers Inc., 2023) Kundapura, S.; Aditya, B.; Apoorva, K.V.Floods are a catastrophic phenomenon with far-reaching consequences for infrastructure, the economy, and human lives, profoundly impacting regions globally. This study assesses flood susceptibility in four districts of Kerala: Ernakulam, Idukki, Kottayam, and Alappuzha. For the 2018 storm that caused flooding by Cyclone Ockhi, a flood map for the area was produced using Sentinel 1 satellite data in Google Earth Engine environment. The resulting map served as the foundation for further analysis. Based on the literature review, 16 potential flood causative factors were identified and incorporated into spatial maps in the Geographic Information System (GIS) environment. Analysis of the flood dataset was performed using Machine Learning (ML) algorithms, namely, Random Forest (RF), Decision Tree (DT), Gradient Boosting Machine (GBM), and XG Boost (XGB). Grid search was employed to identify the optimal hyperparameters for each algorithm, ensuring improved performance. Recursive Feature Elimination (RFE) was subsequently applied to select the most influential variables, resulting in a refined dataset. The chosen factors' feature importance scores were obtained, which were used to create the flood susceptibility map using the four ML models in a GIS environment. Evaluation metrics such as F1 score, accuracy, precision, recall, and ROC-AUC score were computed for each model, providing insights into the effectiveness of each algorithm in predicting the flood occurrence. The resulting flood susceptibility map for the best-performing ML model visually represents the varying levels of flood risk in the study area. © 2023 IEEE.
