Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
2 results
Search Results
Item A scalable cloud platform using matlab distributed computing server integrated with HDFS(IEEE Computer Society help@computer.org, 2012) Dutta, R.; Annappa, B.The Hadoop Distributed File System (HDFS) is a large data storage system which exhibits several features of a good distributed file system. In this paper we integrate Matlab Distributed Computing Server (MDCS) with HDFS to build a scalable, efficient platform for scientific computations. We use an FTP server on top of HDFS for data transfer from the Matlab system to HDFS. The motivation of using HDFS for storage with MDCS is to provide an efficient, fault-tolerant file system and also to utilize the resources efficiently by making each system serve as both data node for HDFS and worker for MDCS. We test the storage efficiency of HDFS and compare with normal file system for data transfer operations through MDCS. © 2012 IEEE.Item Distributed mining of significant frequent colossal closed itemsets from long biological dataset(Springer Verlag service@springer.de, 2020) Vanahalli, M.K.; Patil, N.Mining colossal itemsets have gained more attention in recent times. An extensive set of short and average sized itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining these little and average sized itemsets. Colossal itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. The new mode of dataset known as long biological dataset was contributed by Bioinformatics. These datasets are high dimensional datasets, which are depicted by an expansive number of features (attributes) and a less number of rows (samples). Extracting huge amount of information and knowledge from high dimensional long biological dataset is a nontrivial task. The existing algorithms are computationally expensive and sequential in mining significant Frequent Colossal Closed itemsets (FCCI) from long biological dataset. Distributed computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. The paper proposes a distributed computing approach for mining FCCI. The row enumerated mining search space is efficiently cut down by pruning strategy enclosed in Distributed Row Enumerated Frequent Colossal Closed Itemset Mining (DREFCCIM) algorithm. The proposed DREFCCIM algorithm is the first distributed algorithm to mine FCCI from long biological dataset. The experimental results demonstrate the efficient performance of the DREFCCIM algorithm in comparison to the current algorithms. © Springer Nature Switzerland AG 2020.
