Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
58 results
Search Results
Item JSON Document Clustering Based on Structural Similarity and Semantic Fusion(Springer Science and Business Media Deutschland GmbH, 2023) Uma Priya, D.; Santhi Thilagam, P.S.The emerging drift toward real-time applications generates massive amounts of JSON data exponentially over the web. Dealing with the heterogeneous structures of JSON document collections is challenging for efficient data management and knowledge discovery. Clustering JSON documents has become a significant issue in organizing large data collections. Existing research has focused on clustering JSON documents using structural or semantic similarity measures. However, differently annotated JSON structures are also related by the context of the JSON attributes. As a result, existing research work is unable to identify the context hidden in the schemas, emphasizing the importance of leveraging the syntactic, semantic, and contextual properties of heterogeneous JSON schemas. To address the specific research gap, this work proposes JSON Similarity (JSim), a novel approach for clustering JSON documents by combining the structural and semantic similarity scores of JSON schemas. In order to capture more semantics, the semantic fusion method is proposed, which correlates schemas using semantic as well as contextual similarity measures. The JSON documents are clustered based on the weighted similarity matrix. The results and findings show that the proposed approach outperforms the current approaches significantly. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Optimization of countour based template matching using GPGPU based hexagonal framework(Institute of Electrical and Electronics Engineers Inc., 2003) Bhagya, M.; Tripathi, S.; Santhi Thilagam, P.S.This paper presents a technique to optimize contour based template matching by using General Purpose computation on Graphics Processing Units (GPGPU). Contour based template matching requires edge detection and searching for presence of a template in an entire image, real time implementation of which is not trivial. Using the proposed solution, we could achieve an implementation fast enough to process a standard video (640 × 480) in real time with sufficient accuracy. © 2014 IEEE.Item EfficientTreeMiner: Mining frequent induced substructures from XML documents without candidate generation(2006) Santhi Thilagam, P.S.; Ananthanarayana, V.S.Tree structures are used extensively in domains such as XML databases, computational biology, pattern recognition, computer networks, web mining, multi-relational data mining and so on. In this paper, we present an EfficientTreeMiner, a computationally efficient algorithm that discovers all frequently occurring induced subtrees in a database of labeled rooted unordered trees. The proposed algorithm mines frequent subtrees without generating any candidate subtrees. Efficiency is achieved by compressing the large database into a condensed data structure, namely prefix string representation, which reduces space complexity and by adopting a Frequent Immediate Descendents method that avoids the costly generation of candidate sets. Experimental results show that our algorithm has less time complexity when compared to existing approaches and is also scalable for mining both long and short frequent subtrees. © 2006 IEEE.Item Semantic partition based association rule mining across multiple databases using abstraction(2007) Santhi Thilagam, P.S.; Ananthanarayana, V.S.Association rule mining activity is both computationally and I/O intensive. A majority of ARM algorithms reported in the literature is efficient in handling high dimensional data but is single database based. Many enterprises maintain several databases independently to serve different purposes. There could be an implicit association among various parts of such data. In this paper, we investigate a mechanism to generate Association Rules (ARs) between the sets of values which are subsets of domains of attributes occurring in relations present in different databases. In our approach, the relevant databases, relations and attributes are identified using knowledge, multiple navigation paths are generated using data dictionary, a structure is constructed which semantically partitions the resultant relation using this navigation paths. We propose an efficient algorithm which uses this structure to generate ARs. © 2007 IEEE.Item An abstraction based communication efficient distributed association rule mining(2008) Santhi Thilagam, P.S.; Ananthanarayana, V.S.Association rule mining is one of the most researched areas because of its applicability in various fields. We propose a novel data structure called Sequence Pattern Count, SPC, tree which stores the database compactly and completely and requires only one scan of the database for its construction. The completeness property of the SPC tree with respect to the database makes it more suitable for mining association rules in the context of changing data and changing supports without rebuilding the tree. A performance study shows that SPC tree is efficient and scalable. We also propose a Doubly Logaxithmic-depth Tree, DLT, algorithm which uses SPC tree to efficiently mine the huge amounts of geographically distributed datasets in order to minimize the communication and computation costs. DLT requires only O(n) messages for support count exchange and it takes only O(log log n) time for exchange of messages, which increases its efficiency. © Springer-Verlag Berlin Heidelberg 2008.Item An RDF approach for discovering the relevant semantic associations in a social network(2008) A.k, T.; Santhi Thilagam, P.S.A social network is a network of interactions between entities of social interest like people, organisations, hobbies and transactions. Finding relevant associations between entities in a social network is of great value in many areas like friendship networks, biology and countering terrorism. Semantic web technology enables us to capture and process relationships among social entities as metadata. Analysing semantic social networks requires newer methods. In a social network, entities are connected by short chains of relationships. Query to find associations between two entities returns a large number of results. One of the major issues is to rank the associations as per user preference. The work presents an approach to rank two categories of semantic associations viz. common associations and informative associations. Associations are modelled as property sequences in an RDF graph and they are ranked based on preferred search mode. Heuristics such as i) information content due to occurrence of a property with respect to all the properties in a description base ii) unpredictability of an association due to participation of its properties in multiple domains iii) the extent of match between user specified keywords and properties and iv) the popularity of nodes involved in a sequence are used to rank associations. The results obtained suggest that these heuristics indeed help in obtaining relevant associations. To scale the results to large RDF graphs, a relevant subgraph is extracted from the input graph on which ranking is applied. The approach is tested successfully on real RDF datasets and multigraphs. © 2008 IEEE.Item DYNA-RANK: Efficient calculation and updation of pagerank(2008) Kale, M.; Santhi Thilagam, P.S.The decision of the ranking of web page is very important in web, as its growing and changing very rapidly. Ranking of the results in a search engine for a query plays crucial role for huge database like Web, where one query can have millions of results. The browsing nature of web will mostly depend on the ranking of the search results. The existing approaches for calculating pagerank values are mostly centralized and the ones which are distributed, are not being used for practical purposes because of the scalability reasons. The centralized approaches considers total web as one graph and they calculate the pagerank values of total graph after certain time period, which takes long execution time and can be in days. In the same way updating the graph also compels to recalculate all the pagerank values of all the pages in the graph. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized pagerank calculation algorithm. Considering the importance of the "Ranking" in searching context, our approach DYNA-RANK, focuses upon efficiently calculating and updating Google's pagerank vector using "peer to peer" system. The changes in the web structure will be handled incrementally amongst the peers. DYNA-RANK produces the relative pagerank on each peer. DYNA-RANK is proven to take less computation time and less number of iterations compared to centralized approach. © 2008 IEEE.Item InterTARM: FP-tree based framework for mining inter-transaction association rules from stock market data(2008) Chhinkaniwala, H.; Santhi Thilagam, P.S.Mining association rules from transactions occurred at different time series is a difficult task because of high computational complexity, very large database size and multidimensional attributes. Traditional techniques, such as fundamental and technical analysis can provide investors with tools for predicting stock prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. We propose a framework called InterTARM on real datasets. Our approach employs effective preprocessing, pruning techniques and available condensed data structure to efficiently discover inter-transaction association rules. © 2008 IEEE.Item A linear hash based indexing scheme for location dependent data broadcast(IEEE Computer Society, 2009) Sriprajna, K.J.; Santhi Thilagam, P.S.Location based data dissemination in mobile wireless environment essentially requires some kind of indexing so that mobile clients can energy-efficiently access required data. Existing Location based indexing scheme (LBIS) for Location Dependent Data (LDD) follows tree based indexing mechanism and builds two levels of indices. But, LBIS spends considerable amount of time in fetching indices leading to increase in tuning time. On the other hand, LBIS records too much of index information making broadcast cycle short. In this paper we propose a hash based indexing scheme for LDD.Proposed scheme has a very compact structure because it binds the hashing parameters along with data and thereby eliminates the index overhead. This method also keeps mobile unit active for maximum three time units, thus decreasing the tuning time drastically. Even when the size of the broadcast data is increased, tuning time remains fairly stable. Through analysis and experiments; the effectiveness of the proposed method is shown. © 2009 IEEE.Item Towards evaluating resilience of SIP server under low rate DoS attack(2011) Kumar, A.; Santhi Thilagam, P.S.; Pais, A.R.; Sharma, V.; Sadalkar, K.Low rate Denial-of Service, DoS, attack recently emerged as the greatest threat to enterprise VoIP systems. Such attacks are difficult to detect and capable of discovering vulnerabilities in protocols with low rate traffic and it noticeably affects the performance of Session Initiation Protocol, SIP, communication. In this paper, we deeply analysis the resilience of SIP server against certain low rate DoS attacks. For this purpose we define performance metrics of SIP server under attack and non-attack scenarios. The performance degradation under attacks gives a measure of resilience of the SIP server. In order to generate normal SIP traffic and the attacks, we defined our own XML scenarios and implemented them using a popular open source tool known as SIPp. The system under evaluation was an open source SIP server. © 2011 Springer-Verlag.
