Browsing by Author "Kanimozhi, K.V."

Now showing 1 - 5 of 5

An enlarged map-reduce using 2logmean-PSO optimization for unstructured data
(SAGE Publications Inc., 2024) Kanimozhi, K.V.; Rajakumar, R.; Venkatesan, M.
Text clustering system is a proper technique which mainly segments large measure of textual documents into clusters. The size of the material influences the clustering of text by reducing its performance. In this manner, the textual document comprises sparse and uninformative features, and thus raises the computational time and decreases the execution of primary clustering process. Feature selection is a crucial system to choose another subset of instructive text feature to enhance text clustering execution and diminish computational time. The implemented model proposes a 2logmean-particle swarm optimization algorithm for the unstructured text clustering. In this newly proposed technique, all the texts are initially converted into ASCII value, and then by using the particle swarm optimization, the document text is clustered. The outcomes display that clustering accuracy of the implemented method is high compared to the existing K-means algorithm. Furthermore, performances of newly implemented techniques are evaluated concerning scalability, less computation speed with colossal dimensionality reduction. © The Author(s) 2019.
Maximum frequent item set based clustering algorithm for big text data
(Blue Eyes Intelligence Engineering and Sciences Publication, 2019) Kanimozhi, K.V.; Rajakumar, K.S.; Venkatesan, M.
Due to fast growth of internet and continuous expansion of World Wide Web like digital libraries, online news contributes to massive amount of electronic unstructured text documents on the web. Although lot traditional techniques are available to extract the knowledge from large collection of text documents, still to improve precision of the web search retrieval and to find most appropriate documents from huge text collections proficiently is a big challenge. Clustering techniques helps the search engine to retrieve the documents. The proposed system overcomes existing problems using bivariate n-gram frequent item clustering algorithm by concept of maximum frequent set which maintain the sequence and meaning of sentence in order to reduce huge dimension and and frequent item sets finds similarity. Then based on maximum document occurrence we cluster the documents. Thus our method obtains quality of clusters when compared with existing methodologies and improves the efficiency. The experiment is shown for sample Newsgroup dataset for existing K-Mean and FICMDO (Frequent item clustering method based on maximum document occurrence) and proved the f-measure is higher for our algorithm. Since the f-measure increases, obtains efficient clusters. Hence it is faster and efficient big data method which improves the performance when compared with vector space model like K-Means algorithm. © BEIESP.
Text document analysis using map-reduce framework
(2018) Kanimozhi, K.V.; Prabhavathy, P.; Venkatesan, M.
Due to the advance Internet and increasing globalization, the electronics forms of information grow in a rapid manner. Extracting the useful hidden information from those multiple documents is a recent challenge. Hence, efficient and automated clustering algorithm which is effective in identifying topics plays the main role in information retrieval. In this paper, the analysis regarding the large unstructured text document corpus using our proposed map-reduce algorithm has been performed, and the results show the advantage of the proposed method by detecting clusters of document features within less computation time and provides premier solution for increasing the precision rate of retrieval in information extraction. � 2018, Springer Nature Singapore Pte Ltd.
Text document analysis using map-reduce framework
(Springer Verlag service@springer.de, 2018) Kanimozhi, K.V.; Prabhavathy, P.; Venkatesan, M.
Due to the advance Internet and increasing globalization, the electronics forms of information grow in a rapid manner. Extracting the useful hidden information from those multiple documents is a recent challenge. Hence, efficient and automated clustering algorithm which is effective in identifying topics plays the main role in information retrieval. In this paper, the analysis regarding the large unstructured text document corpus using our proposed map-reduce algorithm has been performed, and the results show the advantage of the proposed method by detecting clusters of document features within less computation time and provides premier solution for increasing the precision rate of retrieval in information extraction. Â© 2018, Springer Nature Singapore Pte Ltd.
Weighted frequent pattern based agglomerative clustering for large unstructured text data
(Science and Engineering Research Support Society ijbsbt@sersc.org PO Box 5014Sandy Bay TAS 7005 Tasmania, 2020) Kanimozhi, K.V.; Rajakumar, K.S.; Venkatesan, M.
Processing large amount of text using traditional clustering methods are key challenges.Research communities have proposed the various clustering approaches for analyzing unstructured data. Frequent item based clustering method is one of the mostly used clustering for text analytic domain. An approach based on Frequent Weighted Utility Itemsets (FWUI) and then clustering using the MC (Maximum Capturing) algorithm is one of the most effective methods for text clustering. However, the Maximum Capturing clusteringAlgorithm based on the similarity matrix leads to a lot of irrelevant clusters that aren’t desired. In this work, Weighted Frequent Pattern based Agglomerative Clustering(WFUP_AC)is proposed for clustering large text data.First, the Term Frequency (TF) is calculated for each term in the documents to create a weight matrix for all documents. The weights of terms in documents are based on the Inverse Document Frequency. The WFUP algorithm is applied for mining Weighted Frequent Utility Pattern (WFUP) from a number matrix and the weights of terms in documents. Then based on frequent utility itemsets, a similarity matrix is obtained for each document where each entry equals to common frequent itemset between two documents. Then distance matrix is calculated from the similarity matrix, finally Hierarchical Agglomerative Clustering method is applied on the Distance matrix using complete linkage and cut the dendrogram as per the need. Our proposed method has been evaluated on two text document data sets like newsgroup and Reuters data sets with different size consisting of 100,300,500 and 1000 documents. The experimental results show that our method, weighted frequent pattern based agglomerative clustering (WFUP_AC) improves the accuracy of the text clustering compared to MC clustering methods using FIs(Frequent Itemset) and FWUIs. © 2020 SERSC.