Please use this identifier to cite or link to this item:
|Document classification with a weighted frequency pattern tree algorithm
|Proceedings of 2016 International Conference on Data Mining and Advanced Computing, SAPIENCE 2016, 2016, Vol., , pp.29-34
|Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. It is an important problem in Data mining. Due to the exponential growth of documents in the Internet and the emergent need to organize them, developing an efficient document classification method to automatically manipulate web documents is of great importance and has received an ever-increased attention in the recent years. However, the existing approaches to text classification treat documents primarily as a bag of words, where all the information about the document is gathered based on the presence of individual words in the document, and not in what order or context those words appear in a sentence. In this paper we investigate the possibility of adopting the FP-tree, a data structure used in itemset mining, for the representation of training documents in text classification while preserving sentence information. Comparison between our method and other conventional document classification algorithms is conducted on several corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms, especially the ones with primarily disjoint classification categories. � 2016 IEEE.
|Appears in Collections:
|2. Conference Papers
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.