Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 4 of 4

An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets
(Elsevier Inc. usjcs@elsevier.com, 2019) Vanahalli, M.K.; Patil, N.
Mining colossal itemsets from high dimensional datasets have gained focus in recent times. The conventional algorithms expend most of the time in mining small and mid-sized itemsets, which do not enclose valuable and complete information for decision making. Mining Frequent Colossal Closed Itemsets (FCCI) from a high dimensional dataset play a highly significant role in decision making for many applications, especially in the field of bioinformatics. To mine FCCI from a high dimensional dataset, the existing preprocessing techniques fail to prune the complete set of irrelevant features and irrelevant rows. Besides, the state-of-the-art algorithms for the same are sequential and computationally expensive. The proposed work highlights an Effective Improved Parallel Preprocessing (EIPP) technique to prune the complete set of irrelevant features and irrelevant rows from high dimensional dataset and a novel efficient Parallel Frequent Colossal Closed Itemset Mining (PFCCIM) algorithm. Further, the PFCCIM algorithm is integrated with a novel Rowset Cardinality Table (RCT), an efficient method to check the closeness of a rowset and also an efficient pruning strategy to cut down the mining search space. The proposed PFCCIM algorithm is the first parallel algorithm to mine FCCI from a high dimensional dataset. The performance study shows the improved effectiveness of the proposed EIPP technique over the existing preprocessing techniques and the improved efficiency of the proposed PFCCIM algorithm over the existing algorithms. © 2018 Elsevier Inc.
An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets
(Elsevier B.V., 2019) Vanahalli, M.K.; Patil, N.
The abundant data across a variety of domains including bioinformatics has led to the formation of dataset with high dimensionality. The conventional algorithms expend most of their time in mining a large number of small and mid-sized itemsets which does not enclose complete and valuable information for decision making. The recent research is focused on Frequent Colossal Closed Itemsets (FCCI), which plays a significant role in decision making for many applications, especially in the field of bioinformatics. The state-of-the-art algorithms in mining FCCI from datasets consisting of a large number of rows and a large number of features are computationally expensive, as they are either pure row or feature enumeration based algorithms. Moreover, the existing preprocessing techniques fail to prune the complete set of irrelevant features and irrelevant rows. The proposed work emphasizes an Effective Improvised Preprocessing (EIP) technique to prune the complete set of irrelevant features and irrelevant rows, and a novel efficient Dynamic Switching Frequent Colossal Closed Itemset Mining (DSFCCIM) algorithm. The proposed DSFCCIM algorithm efficiently switches between row and feature enumeration methods based on data characteristics during the mining process. Further, the DSFCCIM algorithm is integrated with a novel Rowset Cardinality Table, Itemset Support Table, two efficient methods to check the closeness of rowset and itemset, and two efficient pruning strategies to cut down the search space. The proposed DSFCCIM algorithm is the first dynamic switching algorithm to mine FCCI from datasets consisting of a large number of rows and a large number of features. The performance study shows the improved effectiveness of the proposed EIP technique over the existing preprocessing techniques and the improved efficiency of the proposed DSFCCIM algorithm over the existing algorithms. © 2019 Elsevier B.V.
Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset
(Academic Press Inc. apjcs@harcourt.com, 2020) Vanahalli, M.K.; Patil, N.
The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms. © 2020 Elsevier Inc.
An efficient colossal closed itemset mining algorithm for a dataset with high dimensionality
(King Saud bin Abdulaziz University, 2022) Vanahalli, M.K.; Patil, N.
The greater interest of research in the field of bioinformatics and the ample amount of available data across the different domains paved the way for the generation of the dataset with high dimensionality. The number of features in the dataset with high dimensionality are very high and number of rows are less. The significance of the Frequent Colossal Closed Itemsets (FCCI) is high for diverse applications and also for the field of bioinformatics. FCCI are very prominent in the process of the decision making. Amount of information extraction from the dataset with high dimensionality is huge and this extraction is a non-trivial task. The pruning of all the inadmissible features and rows is not performed by the state-of-the-art algorithms. The proposed work articulates the pruning of all the inadmissible features and rows, an efficient pruning strategy to snip the row enumeration mining search space and closure method for checking the closeness of the rowset. An efficient row enumeration algorithm enclosing the rowset closure checking method and pruning strategy is designed to efficiently mine the complete set of FCCI. The experimental results demonstrate the effectiveness of pruning all the inadmissible features and rows. © 2020 The Authors

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results