Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
16 results
Search Results
Item A novel data structure for efficient representation of large data sets in data mining(2006) Pai, R.M.; Ananthanarayana, V.S.An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm. © 2006 IEEE.Item Prefix-Suffix trees: A novel scheme for compact representation of large datasets(Springer Verlag, 2007) Pai, R.M.; Ananthanarayana, V.S.An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established. © Springer-Verlag Berlin Heidelberg 2007.Item Medical image segmentation using improved mountain clustering technique version-2(2010) Verma, N.K.; Roy, A.; Vasikarla, S.This paper proposes Improved Mountain Clustering version-2 (IMC-2) based medical image segmentation. The proposed technique is a more powerful approach for medical image based diagnosing diseases like brain tumor, tooth decay, lung cancer, tuberculosis etc. The IMC-2 based medical image segmentation approach has been applied on various categories of images including MRI images, dental X-rays, chest X-rays and compared with some widely used segmentation techniques such as K-means, FCM and EM as well as with IMC-1. The performance of all these segmentation approaches is compared on widely accepted validation measure, Global Silhouette Index. Also, the segments obtained from the above mentioned segmentation approaches have been visually evaluated. © 2010 IEEE.Item Alignment based similarity distance measure for better web sessions clustering(Elsevier B.V., 2011) Poornalatha, G.; Raghavendra, P.S.The evolution of the internet along with the popularity of the web has attracted a great attention among the researchers to web usage mining. Given that, there is an exponential growth in terms of amount of data available in the web that may not give the required information immediately; web usage mining extracts the useful information from the huge amount of data available in the web logs that contain information regarding web pages accessed. Due to this huge amount of data, it is better to handle small group of data at a time, instead of dealing with entire data together. In order to cluster the data, similarity measure is essential to obtain the distance between any two user sessions. The objective of this paper is to propose a technique, to measure the similarity between any two user sessions based on sequence alignment technique that uses the dynamic programming method. © 2011 Published by Elsevier Ltd.Item Web page prediction by clustering and integrated distance measure(2012) Poornalatha, G.; Raghavendra, S.R.The tremendous progress of the internet and the World Wide Web in the recent era has emphasized the requirement for reducing the latency at the client or the user end. In general, caching and prefetching techniques are used to reduce the delay experienced by the user while waiting to get the web page from the remote web server. The present paper attempts to solve the problem of predicting the next page to be accessed by the user based on the mining of web server logs that maintains the information of users who access the web site. The prediction of next page to be visited by the user may be pre fetched by the browser which in turn reduces the latency for user. Thus analyzing user's past behavior to predict the future web pages to be navigated by the user is of great importance. The proposed model yields good prediction accuracy compared to the existing methods like Markov model, association rule, ANN etc. © 2012 IEEE.Item Clustering using levy flight cuckoo search(Springer Verlag service@springer.de, 2013) Senthilnath, J.; Das, V.; Omkar, S.N.; Mani, V.In this paper, a comparative study is carried using three nature-inspired algorithms namely Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Cuckoo Search (CS) on clustering problem. Cuckoo search is used with levy flight. The heavy-tail property of levy flight is exploited here. These algorithms are used on three standard benchmark datasets and one real-time multi-spectral satellite dataset. The results are tabulated and analysed using various techniques. Finally we conclude that under the given set of parameters, cuckoo search works efficiently for majority of the dataset and levy flight plays an important role. © 2013 Springer.Item Automatic generation of web service composition templates using WSDL descriptions(Springer Verlag service@springer.de, 2015) Kamath S․, S.; Alse, S.; Prasad, P.; Chennagiri, A.R.Due to the extensive use and increase in the number of published web services, clustering and automatic tagging of web services to facilitate efficient discovery of web services is crucial. Discovering composite services has gained importance as there is a need for integrating web services to meet complex service requirements. In this regard, we propose a system for clustering services based on features extracted from their WSDL documents for generating service tags and then the cluster tags. Also, based on the service requirements specified by the requester, our system can identify and generate potential composite service templates. These are basically the subgraphs of the service dependency graph generated by considering only relevant services determined by matching cluster tags and service tags with the request tokens. It was seen that the search domain for service composition was significantly reduced by clustering and tagging and the system obtained meaningful and encouraging results. © Springer India 2015.Item Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensional data(Institute of Electrical and Electronics Engineers Inc., 2015) Rayasam, R.C.; Kannan, R.; Patil, N.Data mining concepts have been extensively used for disease prediction in the medical field. Many Hybrid Prediction Models (HPM) have been proposed and implemented in this area, however, there is always a need for increasing accuracy and efficiency. The existing methods take into account all the features to build the classifier model thus reducing the accuracy and increasing the overall processing time. This paper proposes a Genetic Algorithm based Wrapper feature selection Hybrid Prediction Model (GWHPM). This model initially uses k-means clustering technique to remove the outliers from the dataset. Further, an optimal set of features are obtained by using Genetic Algorithm based Wrapper feature selection. Finally, it is used to build the classifier models such as Decision Tree, Naive Bayes, k nearest neighbor and Support Vector Machine. A comparative study of GWHPM is carried out and it is observed that the proposed model performed better than the existing methods. © 2014 IEEE.Item An iterative MapReduce framework for sports-based tweet clustering(Association for Computing Machinery acmhelp@acm.org, 2015) Saxena, G.; Santurkar, S.In recent years, social media has evolved into a vital source for real-time information. Sports is one of the most popular topics on social media and attracts the attention of users all over the world. However, a large amount of data is generated on a daily basis, making it difficult for the fans to follow the topics of their interest. Clustering of these posts can resolve this issue by retrieving unambiguous and distinct topics. MapReduce is a programming paradigm that is very effective in designing distributed applications that can be deployed on the cloud. Clustering algorithms are generally iterative in nature. The performance gain offered by MapReduce cannot be completely realized by these algorithms due to the inherent architectural bottlenecks associated with iterative tasks. Twister is a MapReduce-based framework designed to minimize these bottlenecks. In this paper, we propose a distributed framework that gathers sports-related tweets and clusters them into distinct topics using the DB-SCAN algorithm customized for Twister. The accuracy of the framework was analysed using the precision-recall scoring mechanism to determine the set of DBSCAN and framework parameters that result in the best set of clusters. The performance of our framework is evaluated based on our clustering results and simulations using the MRSim simulator. We expect that this framework could be used as a model for performing topic detection over generic tweets. We have used the domain of sports to establish the proof of this concept. © 2015 ACM.Item A New Glowworm Swarm Optimization Based Clustering Algorithm for Multimedia Documents(Institute of Electrical and Electronics Engineers Inc., 2016) Pushpalatha, K.; Ananthanarayana, V.S.Due to the explosion of multimedia data, the demand for the sophisticated multimedia knowledge discovery systems has been increased. The multimodal nature of multimedia data is the big barrier for knowledge extraction. The representation of multimodal data in a unimodal space will be more advantageous for any mining task. We initially represent the multimodal multimedia documents in a unimodal space by converting the multimedia objects into signal objects. The dynamic nature of the glowworms motivated us to propose the Glowworm Swarm Optimization based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics. The better purity and entropy values indicates that the GSOMDC algorithm successfully clusters the multimedia documents into topics. The goodness of the clustering is evaluated by performing the cluster based retrieval of multimedia documents with better precision values. © 2015 IEEE.
