Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 11
  • Item
    Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques
    (CEUR-WS, 2021) Rajalakshmi, R.; Mattins, F.; Srivarshan, S.; Reddy, L.P.; Anand Kumar, M.
    Hate Speech is described as any form of speech in which speakers attempt to ridicule, humiliate, or inculcate hatred in someone else’s minds based on characteristics such as religion, the colour of skin, race, or sexual preference. In recent years, social networking sites have been a major source of excessive amounts of hate speech. If unaddressed, these might cause anxiety and despair in the affected individuals or groups. As a result, the above-mentioned social networks utilize an assortment of algorithms to identify such hate speech. Detecting Hate Speech in English texts has been one of the hottest topics in recent years, with multiple types of research being published. However, in regional and indigenous languages, hate speech detection is a recent area with not much research being conducted. It is difficult to perform hate speech detection using data in regional languages due to a lack of large enough training data and a lack of resources about that domain. The HASOC [1] 2021 Hate Speech Detection Task solves one of the problems. It provides a dataset containing Tweet data in English, Hindi [2] and Marathi [3] languages. There were two subtasks as part of the main task. The subtask was to classify the hate speech and offensive texts in the Hindi and Marathi tweet dataset as Hate Speech (HATE), Offensive (OFFN) or Profane (PRF). This work compares the performance of different models on both subtasks and provides a conclusion on the best performing model. The Random Forest Classifier reports the most remarkable accuracy on the first subtask with a macro F1 score of 75.19% and 73.12% on the Marathi and Hindi tweet datasets. The XGBoost algorithm is the best performing algorithm on the second subtask with a 46.5% macro F1 score. Overall any of these models can get satisfactory results when dealing with hate speech detection in regional language. This work has been submitted to the FIRE2021 shared task, Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC-2021) by team DLRG. © 2021 Copyright for this paper by its authors.
  • Item
    Bayesian optimization and gradient boosting to detect phishing websites
    (Institute of Electrical and Electronics Engineers Inc., 2021) Pavan, R.; Nara, M.; Gopinath, S.; Patil, N.
    We propose an Extreme Gradient Boosting framework for classification and regression problems emerging in machine learning for small-sized data sources sampled from a discrete distribution, i.e. data containing discrete or quantized attributes. The model parameters are iteratively refined from a prior belief for specific use cases using Bayesian optimization. We focus the application area of this framework on detecting fraudulent websites. With properly stated reasoning, we empirically test our methodology on a publicly available and bench-marked UCI Phishing dataset to demonstrate the superior performance of this approach as compared to existing methods in the literature. © 2021 IEEE.
  • Item
    Intelligent Modeling for Shear Strength of RC Exterior Beam-Column Joint Subjected to Seismic Loading
    (Springer Science and Business Media Deutschland GmbH, 2023) Swapnil, B.; Palanisamy, T.
    RC beam-column joints are subjected to impounding shear demand and bond-slip during the event of an earthquake. Accurate prediction of joint shear strength is necessary to avoid brittle shear failure in design and retrofitting procedures. In this study the accurate shear strength of RC exterior beam-column joints are predicted by providing a contemporary intelligent modeling approach through eXtreme Gradient Boosting regressor (XGBoost), an ensemble learning technique that combines several weak learners to generate a strong predictive model. From the experimental results of diverse publications on exterior beam-column joints, parameters affecting joint shear strength are found through examination of current models, and a vast database is constructed. Eleven such parameters that describe the material property, geometric configuration and bond resistance, are chosen as the inputs, and joint shear strength as the output. The model is then trained, tested and validated on this database. The performance of this model is evaluated by various regression evaluation metrics such as MSE, RMSE, and R2. Comparison of this model with the existing empirical equation, code provisions, and even with an individual ML algorithm, demonstrated its superiority over all the models in terms of accuracy and computation time. Sensitivity analysis done using predictive power score (PPS) showed that the most important parameter for the estimation of the shear strength of RC exterior beam-column joint is the percentage of beam longitudinal reinforcement. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
  • Item
    A Stacked Model Approach for Machine Learning-Based Traffic Prediction
    (Springer Science and Business Media Deutschland GmbH, 2024) Divakarla, U.; Chandrasekaran, K.
    The application of technology for sensing, analysis, control, and communication within ground transportation is referred to as an intelligent transportation system. This system aims to enhance safety, mobility, and efficiency. Intelligent Transportation Systems (ITSs) are in the process of development and implementation, leading to improved accuracy in predicting traffic flow. The efficacy of traveler information systems, public transportation, and advanced traffic control is said to depend on these systems. In order to effectively manage and lessen traffic congestion, practical execution is essential, as evidenced by the expanding use of data in transportation management. By employing machine learning (ML), it is possible to construct predictive models that incorporate diverse data from numerous sources. Predicting traffic movement, reducing congestion, and identifying optimal routes that consume the least time or energy all require traffic prediction, which involves forecasting traffic volume and density. Traffic estimation and prediction systems have the potential to reduce travel times and enhance traffic conditions by enabling more efficient utilization of available capacity. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
  • Item
    AI based Solar Power Forecasting
    (Institute of Electrical and Electronics Engineers Inc., 2024) Jain, N.; Naik, D.
    Maintaining equilibrium between generation and load is crucial for maximizing economic scheduling in smart grids. As solar energy forecasting gains importance due to its sporadic nature and climatic dependencies, this study leverages advanced machine learning and deep learning models for accurate prediction. Specifically, we use XGBoost and Long Short-Term Memory (LSTM) models to analyze data from two solar installations in India over a 34-day period. Our approach enhances the efficiency and reliability of solar energy utilization in smart grids. Evaluated over a 3-day test period, the LSTM model achieved an RMSE of approximately 2870 kW, a 22% improvement over a baseline model with an RMSE of 3699 kW. These results highlight the potential of machine learning and deep learning to improve solar power forecasting accuracy, thereby facilitating more effective energy management in smart grids. © 2024 IEEE.
  • Item
    Malware Classification Using XGBoost and Genetic Algorithm for Hyperparameter Tuning
    (Institute of Electrical and Electronics Engineers Inc., 2024) Divakarla, U.; Chandrasekaran, K.; Harish, S.V.; Kanal, P.G.; Shalini, C.
    All human activities are being moved into the virtual world due to technological advancements. Since so much of our data is stored on computers and networks, the frequency of cyberattacks has sharply increased. Understanding the many types of malware, their danger level, defense strategies, and potential methods of infecting computers and other devices requires the ability to identify and classify them. In this research, we propose a malware categorization model. Our proposed model is based on XGBoost and uses a Genetic Algorithm for hyperparameter tuning. The system achieved high accuracy with the help of two different malware datasets used for testing and training: Malevis and Malimg. © 2024 IEEE.
  • Item
    Non-Invasive Detection of Anemia Using Deep Learning on Conjunctival Images
    (Institute of Electrical and Electronics Engineers Inc., 2025) Kedar, D.S.; Pandey, G.; Koolagudi, S.G.
    Anemia, characterized by low levels of red blood cells or hemoglobin, affects millions worldwide, significantly affecting public health. Traditional diagnostic methods, while effective, are invasive, costly, and inaccessible in resource-constrained settings. This paper proposes a non-invasive approach for anemia detection using conjunctival images analyzed through deep learning techniques. The proposed methodology involves capturing high-resolution conjunctival images, pre-processing them, and using a customized Convolutional Neural Network (CNN) for feature extraction and classification. The results achieved by the customized CNN fine-tuned with a batch size of 16 give an Accuracy of 96%, Precision of 95%, Recall of 96%, and ROC-AUC score of 0.99. The customized CNN outperformed the other models for this work, such as Random Forest, XGBoost, SVM, ResNet50, and MobileNetV2. This work highlights the potential for non-invasive diagnostic tools to improve accessibility and efficiency in healthcare, particularly for underserved populations. The findings endorse integrating deep learning in healthcare as a transformative approach to address global challenges such as anemia. © 2025 IEEE.
  • Item
    Robust Solar Irradiance Prediction: A Hybrid Approach Using XGBoost for Feature Extraction and WaveNet for Forecasting
    (Springer Science and Business Media Deutschland GmbH, 2025) Chiranjeevi, M.; Moger, T.; Jena, D.
    Accurately forecasting solar irradiance is crucial for maximizing solar energy utilization. However, in practical applications, the complex nature of irradiance patterns and the common issue of missing data pose significant challenges, making precise predictions difficult and increasing uncertainty and instability in forecasts. This paper addresses the challenge of predicting solar power output, particularly in scenarios where equipment failures lead to inaccurate or missing data. To overcome these issues, effective preprocessing techniques are employed to improve data quality before forecasting. XGBoost is utilized for feature extraction, ensuring that the model identifies and leverages the most relevant features. Additionally, a WaveNet model is used for solar irradiance prediction, capitalizing on its computational efficiency and sensitivity to small fluctuations in the data. This integrated approach aims to enhance the accuracy of solar irradiance predictions, even in the presence of data irregularities. The results suggest that the proposed model outperforms other benchmark models in terms of performance metrics achieving an R2 score of 0.9733. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
  • Item
    Machine Learning Solutions for Predicting Bankruptcy in Indian Firms
    (Springer Science and Business Media Deutschland GmbH, 2025) Chaithra; Sharma, P.; Mohan, R.
    The growing demand to identify potential bankrupt companies has prompted more research into bankruptcy prediction, assisting stakeholders in determining the worthiness of an investment. The Indian stock market offers investment opportunities, but it also involves risk. As a result, it is critical to invest in fundamentally sound companies for long-term investment. To address this need, we created a machine learning-based model for identifying a healthy and distressed firm in the Indian scenario. We created a dataset consisting of 118 bankrupt and 310 healthy firms. The dataset contains three labels: bankrupt, healthy, and financial distress. The addition of the financial distress category improves our ability to recognize and identify firms that are more likely to declare bankruptcy. Recognizing the shortcomings of limited data in the Indian scenario in previous research, our study aimed to include more data instances for training. The dataset included widely recognized financial ratios and macroeconomic data that recognize the interconnectedness of broader economic trends with the company’s financial health. Advanced machine learning algorithms, namely Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Categorical Boosting (CatBoost), Gradient Boost (GB), and K-Nearest Neighbors (KNN) were applied. The XGBoost and LGBM demonstrated the highest level of classification accuracy and also performed well on real-world data, demonstrating their potential use in supporting investors with decision-making processes. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
  • Item
    Tether Force Estimation Airborne Kite Using Machine Learning Methods
    (Multidisciplinary Digital Publishing Institute (MDPI), 2025) Gupta, A.; Kashyap, Y.; Kosmopoulos, P.
    This paper explores the potential of Airborne Wind Energy Systems to revolutionize wind energy generation, demonstrating advancements over current methods. Through a series of controlled field experiments and the application of classical machine learning techniques, we achieved significant improvements in tether force estimation. Our XGBoost model, for example, demonstrated a notable reduction in error in predicting the tether force that can be extracted at a particular location, with a root mean square error of 52.3 Newtons and a mean absolute error of 32.1 Newtons, coupled with a (Formula presented.) error, which measures the proportion of variance explained by the model, achieved an impressive value of 0.93. These findings not only validate the effectiveness of our proposed methods but also illustrate their potential to optimize the deployment of Airborne Wind Energy Systems, thereby maximizing energy output and contributing to a sustainable, low-carbon energy future. By analyzing key input features such as wind speed and kite dynamics, our model predicts optimal locations for Airborne Wind Energy System installation, offering a promising alternative to traditional wind turbines. © 2025 by the authors.