Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
9 results
Search Results
Item An improved K-means algorithm using modified cosine distance measure for document clustering using Mahout with Hadoop(Institute of Electrical and Electronics Engineers Inc., 2015) Sahu, L.; Mohan, R.In this paper, we have proposed a novel K-means algorithm with modified Cosine Distance Measure for clustering of large datasets like Wikipedia latest articles and Reuters dataset. We are customizing Cosine Distance Measure for computing similarity between objects for improving cluster quality. Our method will calculate the similarity between objects by Cosine Distance Measure and then try to bring distance more closer by squaring the distance if it is between 0 to 0.5 else increase it. It will result in minimum Intra-cluster and maximizes Inter-cluster distance value. We are measuring cluster quality in term of Inter and Intra-cluster distances, good Feature weighting such as TF-IDF, Cluster Size and Top terms of the clusters. We have compared K-means algorithm by Cosine and modified Cosine Distance measure by setting performance metric such as Inter-cluster and Intra-cluster distances, Cluster size, Execution time etc. Our experimental result shows in minimizing Intra-cluster by 0.016% and maximizing Inter-cluster distance by 0.012%, reducing the cluster size by 1.5% and reducing sequence file size by 4%, that will result in good cluster quality. © 2014 IEEE.Item Feature engineering on forest cover type data with ensemble of decision trees(Institute of Electrical and Electronics Engineers Inc., 2015) Pruthvi, H.R.; Nisha, K.K.; Chandana, T.L.; Navami, K.; Mohan, R.The paper aims to determine the forest cover type of the dataset containing cartographic attributes evaluated over four wilderness areas of Roosevelt National Forest of Northern Colorado. The cover type data is provided by US Forest service inventory, while Geographic Information System (GIS) was used to derive cartographic attributes like elevation, slope, soil type etc. Dataset was analyzed, pre processed and feature engineering techniques were applied to derive relevant and non-redundant features. A comparative study of various decision tree algorithms namely, CART, C4.5, C5.0 was performed on the dataset. With the new dataset built by applying feature engineering techniques, Random Forest and C5.0 improved the accuracy by 9% compared to the raw dataset. © 2015 IEEE.Item The effect of software aging on power usage(Institute of Electrical and Electronics Engineers Inc., 2015) Mohan, R.; Guddeti, G.This paper tries to establish relation between the power usage and software aging. Software aging is the performance degradation of long running software due to shrinking in physical memory, increase in swap read and write rate and increase in CPU utilization. This paper tries to establish the relation between the Software aging and the power usage. Experimental results demonstrate that CPU utilization increases over a period of time, when the work load remains the constant. Linear Regression analysis is used for establishing this trend. © 2015 IEEE.Item Analysis of free physical memory in server virtualized system(Institute of Electrical and Electronics Engineers Inc., 2015) Mohan, R.; Guddeti, G.Degradation of the performance is the part of any long running software systems. This is due to memory leakage, unreleased file descriptors, round off errors and disk and memory fragmentation. It has been found that the memory leakage is the primary cause of any software performance degradation. In order to predict the software performance degradation, the analysis of the resource usage is essential. Here the free physical memory of a server virtualised system is analysed using time series analysis. © 2015 IEEE.Item Performance analysis of process driven and event driven web servers(Institute of Electrical and Electronics Engineers Inc., 2015) Prakash, P.; Mohan, R.; Kamath, M.Nowadays virtualization playing a vital role in the cloud technology. Users deploying websites on virtual private servers called instances which is cost effective and scalable. The explosive growth in World Wide Web, users are able to access things with any device at any time and any place. Increasing the access will constitutes the load and stress on the servers. Sometrics are useful to gauge the performance of various architectures. In this analysis we consider three matrices, they are responsiveness, scalability and efficiency. We have defined response time, memory usage and error rate as key factors for responsiveness, scalability and efficiency respectively. In this paper we investigate the performance of two web servers in different scenario, i.e. Apache-process-based webserver and Nginx-asymmetric multi-process event-driven architecture (AMPED). Our research shows that the Nginx server out performs Apache server in terms of responsiveness and scalability, while Apache ensures the efficiency. The results are useful for those who want to host a website on virtual private server (VPS). © 2015 IEEE.Item Machine Learning Solutions for Predicting Bankruptcy in Indian Firms(Springer Science and Business Media Deutschland GmbH, 2025) Chaithra; Sharma, P.; Mohan, R.The growing demand to identify potential bankrupt companies has prompted more research into bankruptcy prediction, assisting stakeholders in determining the worthiness of an investment. The Indian stock market offers investment opportunities, but it also involves risk. As a result, it is critical to invest in fundamentally sound companies for long-term investment. To address this need, we created a machine learning-based model for identifying a healthy and distressed firm in the Indian scenario. We created a dataset consisting of 118 bankrupt and 310 healthy firms. The dataset contains three labels: bankrupt, healthy, and financial distress. The addition of the financial distress category improves our ability to recognize and identify firms that are more likely to declare bankruptcy. Recognizing the shortcomings of limited data in the Indian scenario in previous research, our study aimed to include more data instances for training. The dataset included widely recognized financial ratios and macroeconomic data that recognize the interconnectedness of broader economic trends with the company’s financial health. Advanced machine learning algorithms, namely Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Categorical Boosting (CatBoost), Gradient Boost (GB), and K-Nearest Neighbors (KNN) were applied. The XGBoost and LGBM demonstrated the highest level of classification accuracy and also performed well on real-world data, demonstrating their potential use in supporting investors with decision-making processes. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.Item Acetaminophen micropollutant: Historical and current occurrences, toxicity, removal strategies and transformation pathways in different environments(Elsevier Ltd, 2019) Vo, H.N.; Le, G.K.; Nguyen, T.M.; Bui, X.-T.; Nguyen, K.H.; Rene, E.R.; Vo, T.D.H.; Cao Ngoc, N.-D.; Mohan, R.Acetaminophen (ACT) is commonly used as a counter painkiller and nowadays, it is increasingly present in the natural water environment. Although its concentrations are usually at the ppt to ppm levels, ACT can transform into various intermediates depending on the environmental conditions. Due to the complexity of the ACT degradation products and the intermediates, it poses a major challenge for monitoring, detection and to propose adequate treatment technologies. The main objectives of this review study were to assess (i) the occurrences and toxicities, (2) the removal technologies and (3) the transformation pathways and intermediates of ACT in four environmental compartments namely wastewater, surface water, ground water, and soil/sediments. Based on the review, it was observed that the ACT concentrations in wastewater can reach up to several hundreds of ppb. Amongst the different countries, China and the USA showed the highest ACT concentration in wastewater (?300 ?g/L), with a very high detection frequency (81–100%). Concerning surface water, the ACT concentrations were found to be at the ppt level. Some regions in France, Spain, Germany, Korea, USA, and UK comply with the recommended ACT concentration for drinking water (71 ng/L). Notably, ACT can transform and degrade into various metabolites such as aromatic derivatives or organic acids. Some of them (e.g., hydroquinone and benzoquinone) are toxic to human and other life forms. Thus, in water and wastewater treatment plants, tertiary treatment systems such as advanced oxidation, membrane separation, and hybrid processes should be used to remove the toxic metabolites of ACT. © 2019 Elsevier LtdItem Can News Sentiment Improve Deep Learning Models for Nifty 50 Index Forecasting?(Institute of Electrical and Electronics Engineers Inc., 2025) Kotekar, C.S.; Mohan, R.; Kolukuluri, V.A stock index, such as the Nifty 50, offers diversified exposure and reduces the risk of investing in individual companies. Index price movements are influenced by internal and external factors, including political, economic, and environmental developments, as well as historical trends. The relationship between news sentiment and the Nifty50 return has not been thoroughly studied. This study examines whether financial news sentiment affects index movements and how sentiment can enhance the prediction of next-day returns. Polarity and subjectivity are extracted from financial news using pre-trained transformer models. Deep learning models, including LSTM, GRU, SimpleRNN, and temporal Kolmogorov-Arnold network (TKAN), are trained on return sign, polarity, and subjectivity using a five-day rolling window to forecast the next-day index return sign. Experimental results demonstrate that the proposed approach outperforms baseline methods, achieving a 5.2% improvement in average accuracy. Incorporating polarity and historical return signs enhances performance across all models. By employing a focused feature set, domain-specific sentiment analysis, and a streamlined architecture, the model achieves superior predictive accuracy. Causal analysis and Shapley Additive Explanations (SHAP) reveal that polarity exhibits a causal effect on returns, while subjectivity does not. The study has practical significance, offering day traders and short-term investors timely, data-driven insights to manage risk and make informed investment choices. © 2013 IEEE.Item Fault Tree Analysis: A Review on Analysis, Simulation Tools and Reliability Dataset for Safety-critical Systems(World Researchers Associations, 2025) Madhusmita, D.; Mohan, R.; Guddeti, G.R.M.Risk analysis is a crucial and prominent method to analyze the dependability attributes of safety-critical systems. Risk analysis comprises a wide variety of State-of-the-Art techniques. Out of these, this study only focuses on the Fault Tree Analysis (FTA) technique. Except for the evaluation techniques, we also paid attention to the survey of simulation tools along with the reliability datasets. © 2025, World Researchers Associations. All rights reserved.
