An improved K-means algorithm using modified cosine distance measure for document clustering using Mahout with Hadoop

dc.contributor.authorSahu, L.
dc.contributor.authorMohan, B.R.
dc.date.accessioned2020-03-30T09:58:48Z
dc.date.available2020-03-30T09:58:48Z
dc.date.issued2015
dc.description.abstractIn this paper, we have proposed a novel K-means algorithm with modified Cosine Distance Measure for clustering of large datasets like Wikipedia latest articles and Reuters dataset. We are customizing Cosine Distance Measure for computing similarity between objects for improving cluster quality. Our method will calculate the similarity between objects by Cosine Distance Measure and then try to bring distance more closer by squaring the distance if it is between 0 to 0.5 else increase it. It will result in minimum Intra-cluster and maximizes Inter-cluster distance value. We are measuring cluster quality in term of Inter and Intra-cluster distances, good Feature weighting such as TF-IDF, Cluster Size and Top terms of the clusters. We have compared K-means algorithm by Cosine and modified Cosine Distance measure by setting performance metric such as Inter-cluster and Intra-cluster distances, Cluster size, Execution time etc. Our experimental result shows in minimizing Intra-cluster by 0.016% and maximizing Inter-cluster distance by 0.012%, reducing the cluster size by 1.5% and reducing sequence file size by 4%, that will result in good cluster quality. � 2014 IEEE.en_US
dc.identifier.citation9th International Conference on Industrial and Information Systems, ICIIS 2014, 2015, Vol., , pp.-en_US
dc.identifier.urihttps://idr.nitk.ac.in/jspui/handle/123456789/7307
dc.titleAn improved K-means algorithm using modified cosine distance measure for document clustering using Mahout with Hadoopen_US
dc.typeBook chapteren_US

Files