Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 24

Application and Evaluation of Random Forest Classifier Technique for Fault Detection in Bioreactor Operation
(Taylor and Francis Ltd. michael.wagreich@univie.ac.at, 2017) Shrivastava, R.; Mahalingam, H.; Dutta, N.N.
Bioreactors and associated bioprocesses are quite complex and nonlinear in nature. A small change in initial condition can greatly alter the output product quality. It is pretty difficult at times to model the system mathematically. In this work, the fault detection problem is studied in the context of bioreactors, mainly, a reactor from the penicillin production process. It is very important to identify the faults in a live process to avoid product quality deterioration. We have focused on the process history-based methods to identify the faults in a bioreactor. We want to introduce random forest (RF), a powerful machine learning algorithm, to identify several types of faults in a bioreactor. The algorithm is simple, easy to use, shows very good generalization ability without compromising much on the classification accuracies, and also has an ability to give variable importance as a part of the algorithm output. We compared its performance with two popular methods, namely support vector machines (SVM) and artificial neural networks (ANN), and found that the overall performance is superior in terms of classification accuracies and generalization ability. © 2017, Copyright © Taylor & Francis Group, LLC.
Carotid wall segmentation in longitudinal ultrasound images using structured random forest
(Elsevier Ltd, 2018) Yamanakkanavar, Y.; Asha, C.S.; Teja A, H.S.; Narasimhadhan, A.V.
Edge detection is a primary image processing technique used for object detection, data extraction, and image segmentation. Recently, edge-based segmentation using structured classifiers has been receiving increasing attention. The intima media thickness (IMT) of the common carotid artery is mainly used as a primitive indicator for the development of cardiovascular disease. For efficient measurement of the IMT, we propose a fast edge-detection technique based on a structured random forest classifier. The accuracy of IMT measurement is degraded owing to the speckle noise found in carotid ultrasound images. To address this issue, we propose the use of a state-of-the-art denoising method to reduce the speckle noise, followed by an enhancement technique to increase the contrast. Furthermore, we present a novel approach for an automatic region of interest extraction in which a pre-trained structured random forest classifier algorithm is applied for quantifying the IMT. The proposed method exhibits IMTmean ± standard deviation of 0.66mm ± 0.14, which is closer to the ground truth value 0.67mm ± 0.15 as compared to the state-of-the-art techniques. © 2018 Elsevier Ltd
Detection of phishing websites using an efficient feature-based machine learning framework
(Springer London, 2019) Rao, R.S.; Pais, A.R.
Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks. © 2018, The Natural Computing Applications Forum.
CatchPhish: detection of phishing websites by inspecting URLs
(Springer, 2020) Rao, R.S.; Vaishnavi, T.; Pais, A.R.
There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
A framework for automated bone age assessment from digital hand radiographs
(Springer, 2020) Simu, S.; Lal, S.
Bone age assessment (BAA) is a method or technique that helps in predicting the age of a person whose age is unavailable and can also be used to find growth disorders if any. The automated bone age assessment system (ABAA) depends heavily on the efficiency of the feature extraction stage and the accuracy of a successive classification stage of the system. This paper has presented the implementation and analysis of feature extraction methods like Bag of features (BoF), Histogram of Oriented Gradients (HOG), and Texture Feature Analysis (TFA) methods on the segmented phalangeal region of interest (PROI) images and segmented radius-ulna region of interest (RUROI) images. Artificial Neural Networks (ANN) and Random Forest classifiers are used for evaluating classification problems. The experimental results obtained by BoF method for feature extraction along with Random Forest for classification have outperformed preceding techniques available in the literature. The mean error (ME) accomplished is 0.58 years and RMSE value of 0.77 years for PROI images and mean error of 0.53 years and RMSE of 0.72 years was achieved for RUROI images. Additionally results also proved that prior knowledge of gender of the person gives better results. The dataset contains radiographs of the left hand for an age range of 0-18 years. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
Classification of aspirated and unaspirated sounds in speech using excitation and signal level information
(Academic Press, 2020) Ramteke, P.B.; Supanekar, S.; Koolagudi, S.G.
In this work, consonant aspiration and unaspiration phenomena are studied. It is known that, pronunciation of aspiration and unaspiration is characterized by the ’puff of air’ released at the place of constriction in the vocal tract also known as burst. Here, properties of the vowel immediately after the burst are studied for characterization of the burst. Excitation source signal estimated from speech as low pass filtered linear prediction residual signal is used for the task. The signal characteristics of parameters such as glottal pulse, duration of open, closed & return phases; slope of open, & return phases; duration of burst; ratio of highest and lowest frame wise energies of signal and voice onset point are explored as features to characterize aspiration and unaspiration. Three datasets namely TIMIT, IIIT Hyderabad Marathi and IIIT Hyderabad Hindi (IIIT-H Indic Speech Databases) are used to verify the proposed approach. Random forest, support vector machine and deep feed forward neural networks (DFFNNs) are used as classifiers to test the effectiveness of the features used for the task. Optimal features are selected for the classification using correlation based feature selection (CFS). From the results, it is observed that the proposed features are efficient in classifying the aspirated and unaspirated consonants. Performance of the proposed features in recognition of aspirated and unaspirated phoneme is also evaluated. IIIT Hyderabad Marathi is considered for the analysis. It is observed that the performance of recognition of aspirated and unaspirated sounds using proposed features is improved in comparison with the MFCCs based phoneme recognition system. © 2020 Elsevier Ltd
Application of word embedding and machine learning in detecting phishing websites
(Springer, 2022) Rao, R.S.; Umarekar, A.; Pais, A.R.
Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68% © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Stacking Deep learning and Machine learning models for short-term energy consumption forecasting
(Elsevier Ltd, 2022) Sujan Reddy, A.; Akashdeep, S.; Harshvardhan, R.; Kamath S․, S.
Accurate prediction of electricity consumption is essential for providing actionable insights to decision-makers for managing volume and potential trends in future energy consumption for efficient resource management. A single model might not be sufficient to solve the challenges that result from linear and non-linear problems that occur in electricity consumption prediction. Moreover, these models cannot be applied in practice because they are either not interpretable or poorly generalized. In this paper, a stacking ensemble model for short-term electricity consumption is proposed. We experimented with machine learning and deep models like Random Forests, Long Short Term Memory, Deep Neural Networks, and Evolutionary Trees as our base models. Based on the experimental observations, two different ensemble models are proposed, where the predictions of the base models are combined using Gradient Boosting and Extreme Gradient Boosting (XGB). The proposed ensemble models were tested on a standard dataset that contains around 500,000 electricity consumption values, measured at periodic intervals, over the span of 9 years. Experimental validation revealed that the proposed ensemble model built on XGB reduces the training time of the second layer of the ensemble by a factor of close to 10 compared to the state-of-the-art, and also is more accurate. An average reduction of approximately 39% was observed in the Root mean square error. © 2022 Elsevier Ltd
Future transition in climate extremes over Western Ghats of India based on CMIP6 models
(Springer Science and Business Media Deutschland GmbH, 2023) Shetty, S.; Umesh, P.; Shetty, A.
The effect of climate change on the tropical river catchments in the Western Ghats of India is studied using the Coupled Model Intercomparison Project-6 data (CMIP-6). Multi-model ensembles of rainfall and temperature are constructed using the Random Forest ensemble technique for bias-corrected GCMs in the near future (2014–2050) and far future (2051–2100) horizons. For the two catchments each in the southern, central, and northern Ghats, the trend in minimum and maximum temperatures, precipitation, and other indices are calculated. By 2100, dry sub-humid and humid catchments will see a higher increase in mean annual temperature than per-humid central catchments. In future decades, the warm days and nights increase by 45–50% and 40–70%, respectively, with twofold warming in the winter season. Under a climate change scenario, annual rainfall increases in Vamanapuram, Ulhas, and Purna, while Chaliyar, Netravati, and Aghanashini catchments experience a decrease in rainfall in the far future with an increase in pre-monsoon rainfall. The southern catchments are anticipated to have contrasting variations in the rainfall extremes; northern catchments face a substantial increase in very wet to extremely wet days and medium to heavy rainfall. In all catchments (excluding Vamanapuram), cumulative wet days increase with a decrease in cumulative dry days. After the mid-twenty-first century, humid to per-humid catchments encompass an increase in cool nights, whereas it disappears in dry sub-humid catchments of the Ghat. Interestingly, warming tendencies begin to slow down after 2050. This investigation can assist in comprehending the regional climate extremes in the Western Ghats to formulate better climate risk planning and adaptation strategies. © 2023, The Author(s), under exclusive licence to Springer Nature Switzerland AG.
Performance Prediction Model Development for Solar Box Cooker Using Computational and Machine Learning Techniques
(American Society of Mechanical Engineers (ASME), 2023) Anilkumar, B.C.; Maniyeri, R.; Anish, S.
The development of prediction models for solar thermal systems has been a research interest for many years. The present study focuses on developing a prediction model for solar box cookers (SBCs) through computational and machine learning (ML) approaches. The prime objective is to forecast cooking load temperatures of SBC through ML techniques such as random forest (RF), k-nearest neighbor (k-NN), linear regression (LR), and decision tree (DT). ML is a commonly used form of artificial intelligence, and it continues to be popular and attractive as it finds new applications every day. A numerical model based on thermal balance is used to generate the dataset for the ML algorithm considering different locations across the world. Experiments on the SBC in Indian weather conditions are conducted from January through March 2022 to validate the numerical model. The temperatures for different components obtained through numerical modeling agree with experimental values with less than 7% maximum error. Although all the developed models can predict the temperature of cooking load, the RF model outperformed the other models. The root-mean-square error (RMSE), determination coefficient (R2), mean absolute error (MAE), and mean square error (MSE) for the RF model are 2.14 (°C), 0.992, 1.45 (°C), and 4.58 (°C), respectively. The regression coefficients indicate that the RF model can accurately predict the thermal parameters of SBCs with great precision. This study will inspire researchers to explore the possibilities of ML prediction models for solar thermal conversion applications. © © 2023 by ASME.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results