Document classification with a weighted frequency pattern tree algorithm

No Thumbnail Available

Date

2016

Authors

Dsouza, F.H.
Ananthanarayana, V.S.

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. It is an important problem in Data mining. Due to the exponential growth of documents in the Internet and the emergent need to organize them, developing an efficient document classification method to automatically manipulate web documents is of great importance and has received an ever-increased attention in the recent years. However, the existing approaches to text classification treat documents primarily as a bag of words, where all the information about the document is gathered based on the presence of individual words in the document, and not in what order or context those words appear in a sentence. In this paper we investigate the possibility of adopting the FP-tree, a data structure used in itemset mining, for the representation of training documents in text classification while preserving sentence information. Comparison between our method and other conventional document classification algorithms is conducted on several corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms, especially the ones with primarily disjoint classification categories. � 2016 IEEE.

Description

Keywords

Citation

Proceedings of 2016 International Conference on Data Mining and Advanced Computing, SAPIENCE 2016, 2016, Vol., , pp.29-34

Endorsement

Review

Supplemented By

Referenced By