Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 38
  • Item
    A novel data structure for efficient representation of large data sets in data mining
    (2006) Pai, R.M.; Ananthanarayana, V.S.
    An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm. © 2006 IEEE.
  • Item
    Prefix-Suffix trees: A novel scheme for compact representation of large datasets
    (Springer Verlag, 2007) Pai, R.M.; Ananthanarayana, V.S.
    An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established. © Springer-Verlag Berlin Heidelberg 2007.
  • Item
    A Privacy Preserved Data Mining Approach Based on k-Partite Graph Theory
    (Elsevier, 2015) Bhat, T.P.; Karthik, C.; Chandrasekaran, K.
    Traditional approaches to data mining may perform well on extraction of information necessary to build a classification rule useful for further categorisation in supervised classification learning problems. However most of the approaches require fail to hide the identity of the subject to whom the data pertains to, and this can cause a big privacy breach. This document addresses this issue by the use of a graph theoretical approach based on k-partitioning of graphs, which paves way to creation of a complex decision tree classifier, organised in a prioritised hierarchy. Experimental results and analytical treatment to justify the correctness of the approach are also included. © 2015 The Authors.
  • Item
    Intelligent Data Mining for Collaborative Information Seeking
    (Springer Nature, 2020) Kumar, A.; Chandrasekaran, K.; Shukla, A.; Usha, D.
    World Wide Web (WWW) contains different kinds of information whether it be social, educational, historical, sports, news, financial, weather, technology, politics etc. Most of the people spend time on the internet to access data for information seeking purposes. Information provided on the web is available in different formats like in text format, image format or video format, and they can be accessed through different access interfaces. Accessing information from such a large place i.e. World Wide Web through so many websites would become a very cumbersome process, therefore, in this paper, we present a new method which will produce information based on the user input using appropriate keywords. The data will be retrieved from the internet using Data Mining approach without the need for rules and training of pages. The main focus will be to extract or retrieve data of a person like educational qualifications, gender, contact information, contributions in his work, his/her social nature, etc. The query to be searched on the platform or model should have meaningful keywords attached to it best describing the person or else data of some different person might be fetched. © 2020, Springer Nature Switzerland AG.
  • Item
    Analysis and Prediction of Fantasy Cricket Contest Winners Using Machine Learning Techniques
    (Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Karthik, K.; S. Krishnan, G.S.; Shetty, S.; Bankapur, S.; Kolkar, R.; Ashwin, T.S.; Vanahalli, M.K.
    Cricket is one of the well-known sports across the world. The increasing interest of cricket in recent years resulted in different forms like T20, T10 from test and one day format. The craze of all these formats of cricket matches today has come into online fantasy cricket league games. Dream11 is one such app that is most popular in this context, along with many similar apps. Creating a dream team of 11 players from playing 11 of both teams involves skills, ideas and luck. Predicting a winner among all the joined contestants based on the previous historical data is a challenging task. In this paper, we used a feed-forward deep neural network (DNN) classifier for predicting the winning contestant for the top three positions in a fantasy league cricket contest. The performance of the DNN approach was compared against that of state-of-the-art machine learning approaches like k-nearest neighbours (KNN), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machines (SVM) and in predicting the fantasy cricket contest winners. Among the methods used, DNN showed the best results for all three positions, showing its consistency in predicting the winners and outperforms the state-of-the-art machine learning classifiers by 13%, 8% and 9%, respectively, for first, second and third winning positions, respectively. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Predictive analytics and data mining in healthcare
    (Institute of Electrical and Electronics Engineers Inc., 2021) Arjun, A.; Srinath, A.; Chandavarkar, B.R.
    Machine Learning and Data Mining for healthcare. There has been an enormous growth in the field of HIT (health information technology) in the recent years. Be it detection of certain diseases, scanning of organs, finding tumors, these machine oriented operations without human intervention, have certainly increased the quality of medical attention one can get, and the technology required has come a long way. Health data tends to be inherently complex with exceptions in almost all cases. Data mining is the technique of converting raw data into a meaningful format. Analysis and prediction on such data, although computationally and algorithmically complex, is an emerging technology that is a small step to more proactive and preventive automated treatment options.There are various data mining techniques such as classification, clustering, association, regression,prediction, pattern recognition etc [I]. Even the efficiency of certain medicines can be found using machine learning techniques, which is a life saving and cost effective method. In this paper, we are going to use machine learning as a tool for predictive analysis to predict chronic kidney diseases based on the Chronic disease dataset taken from VCI M L repository. We will be applying machine learning algorithms, specifically decision trees, to build a classifier to predict if a person has the disease or not. This paper shows the issue that specific machine learning algorithms need to be tailor-made to specific nature of medical data. © 2021 IEEE.
  • Item
    Using Stacking Ensemble Method for Rental Bike Prediction
    (Springer Science and Business Media Deutschland GmbH, 2025) Akashdeep, S.; Mahalinga, A.N.; Harshvardhan, R.; Chinnahalli KomariGowda, S.; Patil, N.
    Rental bike platforms that improve mobility comfort are on the rise in major cities worldwide. One of the essential requirements for these rental bike systems is that bikes are available to end users at the specified time, reducing waiting time. Increased waiting time indicates that movement has been halted, implying that more efficiency can be gained. As a result, the city’s main priority is ensuring a steady supply of bicycles. It’s crucial to be able to forecast the number of bikes needed at each hour for this. This work look at alternative models for forecasting the bike count per hour needed to maintain a steady supply of bikes. Weather data (Temperature, Humidity, Wind speed, Dew point), the quantity of bikes hired every hour, and time information are all used to train the models. Filtering can also be used to exclude non-predictive parameters and rank features based on how well they predict outcomes. The effectiveness of the regression model was assessed using a testing set after they had been trained using repeated cross-validation. For the model Gradient Boosting Machine, the optimum R2 value is 0.96. The most significant predictors are also determined, as well as their relationships. Bike-sharing demand, data mining, predictive analytics, public bikes, regression. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
  • Item
    Fault diagnosis of helical gear box using decision tree through vibration signals
    (RAMS Consultants, 2013) Sugumaran, V.; Jain, D.; Amarnath, M.; Kumar, H.
    This paper uses vibration signals acquired from gears in good and simulated faulty conditions for the purpose of fault diagnosis through machine learning approach. The descriptive statistical features were extracted from vibration signals and the important ones were selected using decision tree (dimensionality reduction). The selected features were then used for classification using J48 decision tree algorithm. The paper also discusses the effect of various parameters on classification accuracy. © RAMS Consultants.
  • Item
    Web sessions clustering using hybrid sequence alignment measure (HSAM)
    (Springer-Verlag Wien michaela.bolli@springer.at, 2013) Poornalatha, G.; Raghavendra, S.R.
    Web usage mining inspects the navigation patterns in web access logs and extracts previously unknown and useful information. This may lead to strategies for various web-oriented applications like web site restructure, recommender system, web page prediction and so on. The current work demonstrates clustering of user sessions of uneven lengths to discover the access patterns by proposing a distance method to group user sessions. The proposed hybrid distance measure uses the access path information to find the distance between any two sessions without altering the order in which web pages are visited. R2 is used to make a decision regarding the number of clusters to be constructed. Jaccard Index and Davies–Bouldin validity index are employed to assess the clustering done. The results obtained by these two standard statistic measures are encouraging and illustrate the goodness of the clusters created. © 2012, Springer-Verlag.
  • Item
    Fault diagnosis of deep groove ball bearing through discrete wavelet features using support vector machine
    (COMADEM International rajbknrao@btinternet.com, 2014) Vernekar, K.; Kumar, H.; Gangadharan, K.V.
    Bearings are the most important and frequently used machine components in most of the rotating machinery. In industry, breakdown of such crucial components causes heavy losses. So prevention of failure of such components is very essential. This paper presents an online fault detection of a bearing used in an internal combustion engine through machine learning approach using vibration signals of bearing in healthy and simulated faulty conditions. Vibration signals are acquired from bearing in healthy as well as different simulated fault conditions of bearing. The Discrete Wavelet Transform (DWT) features were extracted from vibration signals using MATLAB program. Decision tree technique (J48 algorithm) has been used for important feature selection out of extracted DWT features. Support vector machine is being used as a classifier and obtained results found with classification accuracy of 98.67%.The advantage of machine learning technique for fault diagnosis over conventional vibration analysis approach has demonstrated in this paper.