Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 6 of 6
  • Item
    An improved K-means algorithm using modified cosine distance measure for document clustering using Mahout with Hadoop
    (Institute of Electrical and Electronics Engineers Inc., 2015) Sahu, L.; Mohan, R.
    In this paper, we have proposed a novel K-means algorithm with modified Cosine Distance Measure for clustering of large datasets like Wikipedia latest articles and Reuters dataset. We are customizing Cosine Distance Measure for computing similarity between objects for improving cluster quality. Our method will calculate the similarity between objects by Cosine Distance Measure and then try to bring distance more closer by squaring the distance if it is between 0 to 0.5 else increase it. It will result in minimum Intra-cluster and maximizes Inter-cluster distance value. We are measuring cluster quality in term of Inter and Intra-cluster distances, good Feature weighting such as TF-IDF, Cluster Size and Top terms of the clusters. We have compared K-means algorithm by Cosine and modified Cosine Distance measure by setting performance metric such as Inter-cluster and Intra-cluster distances, Cluster size, Execution time etc. Our experimental result shows in minimizing Intra-cluster by 0.016% and maximizing Inter-cluster distance by 0.012%, reducing the cluster size by 1.5% and reducing sequence file size by 4%, that will result in good cluster quality. © 2014 IEEE.
  • Item
    Feature engineering on forest cover type data with ensemble of decision trees
    (Institute of Electrical and Electronics Engineers Inc., 2015) Pruthvi, H.R.; Nisha, K.K.; Chandana, T.L.; Navami, K.; Mohan, R.
    The paper aims to determine the forest cover type of the dataset containing cartographic attributes evaluated over four wilderness areas of Roosevelt National Forest of Northern Colorado. The cover type data is provided by US Forest service inventory, while Geographic Information System (GIS) was used to derive cartographic attributes like elevation, slope, soil type etc. Dataset was analyzed, pre processed and feature engineering techniques were applied to derive relevant and non-redundant features. A comparative study of various decision tree algorithms namely, CART, C4.5, C5.0 was performed on the dataset. With the new dataset built by applying feature engineering techniques, Random Forest and C5.0 improved the accuracy by 9% compared to the raw dataset. © 2015 IEEE.
  • Item
    The effect of software aging on power usage
    (Institute of Electrical and Electronics Engineers Inc., 2015) Mohan, R.; Guddeti, G.
    This paper tries to establish relation between the power usage and software aging. Software aging is the performance degradation of long running software due to shrinking in physical memory, increase in swap read and write rate and increase in CPU utilization. This paper tries to establish the relation between the Software aging and the power usage. Experimental results demonstrate that CPU utilization increases over a period of time, when the work load remains the constant. Linear Regression analysis is used for establishing this trend. © 2015 IEEE.
  • Item
    Analysis of free physical memory in server virtualized system
    (Institute of Electrical and Electronics Engineers Inc., 2015) Mohan, R.; Guddeti, G.
    Degradation of the performance is the part of any long running software systems. This is due to memory leakage, unreleased file descriptors, round off errors and disk and memory fragmentation. It has been found that the memory leakage is the primary cause of any software performance degradation. In order to predict the software performance degradation, the analysis of the resource usage is essential. Here the free physical memory of a server virtualised system is analysed using time series analysis. © 2015 IEEE.
  • Item
    Performance analysis of process driven and event driven web servers
    (Institute of Electrical and Electronics Engineers Inc., 2015) Prakash, P.; Mohan, R.; Kamath, M.
    Nowadays virtualization playing a vital role in the cloud technology. Users deploying websites on virtual private servers called instances which is cost effective and scalable. The explosive growth in World Wide Web, users are able to access things with any device at any time and any place. Increasing the access will constitutes the load and stress on the servers. Sometrics are useful to gauge the performance of various architectures. In this analysis we consider three matrices, they are responsiveness, scalability and efficiency. We have defined response time, memory usage and error rate as key factors for responsiveness, scalability and efficiency respectively. In this paper we investigate the performance of two web servers in different scenario, i.e. Apache-process-based webserver and Nginx-asymmetric multi-process event-driven architecture (AMPED). Our research shows that the Nginx server out performs Apache server in terms of responsiveness and scalability, while Apache ensures the efficiency. The results are useful for those who want to host a website on virtual private server (VPS). © 2015 IEEE.
  • Item
    Machine Learning Solutions for Predicting Bankruptcy in Indian Firms
    (Springer Science and Business Media Deutschland GmbH, 2025) Chaithra; Sharma, P.; Mohan, R.
    The growing demand to identify potential bankrupt companies has prompted more research into bankruptcy prediction, assisting stakeholders in determining the worthiness of an investment. The Indian stock market offers investment opportunities, but it also involves risk. As a result, it is critical to invest in fundamentally sound companies for long-term investment. To address this need, we created a machine learning-based model for identifying a healthy and distressed firm in the Indian scenario. We created a dataset consisting of 118 bankrupt and 310 healthy firms. The dataset contains three labels: bankrupt, healthy, and financial distress. The addition of the financial distress category improves our ability to recognize and identify firms that are more likely to declare bankruptcy. Recognizing the shortcomings of limited data in the Indian scenario in previous research, our study aimed to include more data instances for training. The dataset included widely recognized financial ratios and macroeconomic data that recognize the interconnectedness of broader economic trends with the company’s financial health. Advanced machine learning algorithms, namely Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Categorical Boosting (CatBoost), Gradient Boost (GB), and K-Nearest Neighbors (KNN) were applied. The XGBoost and LGBM demonstrated the highest level of classification accuracy and also performed well on real-world data, demonstrating their potential use in supporting investors with decision-making processes. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.