Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/7322
Title: An iterative MapReduce framework for sports-based tweet clustering
Authors: Saxena, G.
Santurkar, S.
Issue Date: 2015
Citation: ACM International Conference Proceeding Series, 2015, Vol.25-27-September-2015, , pp.9-14
Abstract: In recent years, social media has evolved into a vital source for real-time information. Sports is one of the most popular topics on social media and attracts the attention of users all over the world. However, a large amount of data is generated on a daily basis, making it difficult for the fans to follow the topics of their interest. Clustering of these posts can resolve this issue by retrieving unambiguous and distinct topics. MapReduce is a programming paradigm that is very effective in designing distributed applications that can be deployed on the cloud. Clustering algorithms are generally iterative in nature. The performance gain offered by MapReduce cannot be completely realized by these algorithms due to the inherent architectural bottlenecks associated with iterative tasks. Twister is a MapReduce-based framework designed to minimize these bottlenecks. In this paper, we propose a distributed framework that gathers sports-related tweets and clusters them into distinct topics using the DB-SCAN algorithm customized for Twister. The accuracy of the framework was analysed using the precision-recall scoring mechanism to determine the set of DBSCAN and framework parameters that result in the best set of clusters. The performance of our framework is evaluated based on our clustering results and simulations using the MRSim simulator. We expect that this framework could be used as a model for performing topic detection over generic tweets. We have used the domain of sports to establish the proof of this concept. � 2015 ACM.
URI: http://idr.nitk.ac.in/jspui/handle/123456789/7322
Appears in Collections:2. Conference Papers

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.