Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Evaluation of Machine Learning Frameworks on Bank Marketing and Higgs Datasets
    (Institute of Electrical and Electronics Engineers Inc., 2015) Bhuvan, B.M.; Jain, S.; Rao, V.D.; Patil, N.; Raghavendra, G.S.
    Big data is an emerging field with different datasets of various sizes are being analyzed for potential applications. In parallel, many frameworks are being introduced where these datasets can be fed into machine learning algorithms. Though some experiments have been done to compare different machine learning algorithms on different data, these experiments have not been tested out on different platforms. Our research aims to compare two selected machine learning algorithms on data sets of different sizes deployed on different platforms like Weka, Scikit-Learn and Apache Spark. They are evaluated based on Training time, Accuracy and Root mean squared error. This comparison helps us to decide what platform is best suited to work while applying computationally expensive selected machine learning algorithms on a particular size of data. Experiments suggested that Scikit-Learn would be optimal on data which can fit into memory. While working with huge, data Apache Spark would be optimal as it performs parallel computations by distributing the data over a cluster. Hence this study concludes that spark platform which has growing support for parallel implementation of machine learning algorithms could be optimal to analyze big data. © 2015 IEEE.
  • Item
    GoDB: From Batch Processing to Distributed Querying over Property Graphs
    (Institute of Electrical and Electronics Engineers Inc., 2016) Jamadagni, N.; Simmhan, Y.
    Property Graphs with rich attributes over vertices and edges are becoming common. Querying and mining such linked Big Data is important for knowledge discovery and mining. Distributed graph platforms like Pregel focus on batch execution on commodity clusters. But exploratory analytics requires platforms that are both responsive and scalable. We propose Graph-oriented Database (GoDB), a distributed graph database that supports declarative queries over large property graphs. GoDB builds upon our GoFFish subgraph-centric batch processing platform, leveraging its scalability while using execution heuristics to offer responsiveness. The GoDB declarative query model supports vertex, edge, path and reachability queries, and this is translated to a distributed execution plan on GoFFish. We also propose a novel cost model to choose a query plan that minimizes the execution latency. We evaluate GoDB deployed on the Azure IaaS Cloud, over real-world property graphs and for a diverse workload of 500 queries. These show that the cost model selects the optimal execution plan at least 80% of the time, and helps GoDB weakly scale with the graph size. A comparative study with Titan, a leading open-source graph database, shows that we complete all queries, each in &x2264; 1:6 secs, while Titan cannot complete up to 42% of some query workloads. © 2016 IEEE.
  • Item
    Real time big data analytics in smart city applications
    (Institute of Electrical and Electronics Engineers Inc., 2018) Manjunatha, S.; Annappa, B.
    Technological revolution in the recent past has enabled the concept of Smart City for urban development. Smart City concept is conceived with the objectives of providing better services to the citizens and improves the quality of life. Information and Communication Technology (ICT) and Internet of Things (IoT) made smart city applications as much simpler and effective. Big data technologies play a major role in smart city applications. This paper gives an overview of the role of big data in building smart city applications and proposes a framework for real time big data analytics. Real-time big data analytics help in making better decisions and more accurate predictions at right time to offer better services to the citizens. Here, we discussed some of important solutions and services for the smart city where the real-time big data analytics helps in improving the quality of services in smart city applications. © 2018 IEEE.
  • Item
    Optimal Band Selection Using Generalized Covering-Based Rough Sets on Hyperspectral Remote Sensing Big Data
    (Springer Verlag service@springer.de, 2019) Kelam, H.; Venkatesan, M.
    Hyperspectral remote sensing has been gaining attention from the past few decades. Due to the diverse and high dimensionality nature of the remote sensing data, it is called as remote sensing Big Data. Hyperspectral images have high dimensionality due to number of spectral bands and pixels having continuous spectrum. These images provide us with more details than other images but still, it suffers from ‘curse of dimensionality’. Band selection is the conventional method to reduce the dimensionality and remove the redundant bands. Many methods have been developed in the past years to find the optimal set of bands. Generalized covering-based rough set is an extended method of rough sets in which indiscernibility relations of rough sets are replaced by coverings. Recently, this method is used for attribute reduction in pattern recognition and data mining. In this paper, we will discuss the implementation of covering-based rough sets for optimal band selection of hyperspectral images and compare these results with the existing methods like PCA, SVD and rough sets. © 2019, Springer Nature Singapore Pte Ltd.
  • Item
    Extracting Emotion and Sentiment Quotient of Viral Information Over Twitter
    (Springer Science and Business Media Deutschland GmbH, 2022) Kumar, P.; Reji, R.E.; Singh, V.
    In social media platforms, viral or trending information are consumed for several decision-making, as they harness the information flux. In apt to this, millions of real-time users often consumed the data co-located to these virilities. Thus, encompass sentiment and co-located emotions, could be utilized for the analysis and decision support. Traditionally, sentiment tool offers limited insights and lacks in the extraction of emotional impact. In these settings, estimation of emotion quotient becomes a multifaceted task. The proposed novel algorithm aims, to (i) extract the sentiment and co-located emotions quotient of viral information and (ii) utilities for comprehensive comparison on co-occurring viral information, and sentiment analysis over Twitter data. The emotion and micro-sentiment reveals several valuable insight of a viral topic and assists in decision support. A use-case analysis over real-time extracted data asserts significant insights, as generated sentiments and emotional effects reveals co-relations caused by viral/trending information. The algorithm delivers an efficient, robust, and adaptable solution for the sentiment analysis also. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
  • Item
    Effective Resource Utilization in Hadoop Using Ganglia
    (Institute of Electrical and Electronics Engineers Inc., 2024) Srungarapati, B.; Pamarthi, M.; Vakada, V.; Hegde, A.; Bhowmik, B.
    The exponential growth of big data has led to the widespread adoption of Hadoop clusters for storing and processing large volumes of data. Efficient management of resources within these clusters is crucial for achieving optimal performance and cost efficiency. This research paper explores the use of Hadoop and Ganglia for monitoring and optimizing resource utilization in Hadoop clusters. The study demonstrates that leveraging Hadoop and Ganglia is an effective strategy for improving cluster performance and resource efficiency. Results show significant enhancements in cluster performance and resource utilization, highlighting the importance of proactive resource management in Hadoop environments. © 2024 IEEE.
  • Item
    Leveraging Hybrid Modeling for Enhanced Runtime Prediction in Big Data Jobs
    (Institute of Electrical and Electronics Engineers Inc., 2024) Singh, R.; Zadokar, V.N.; Kumar, S.; Doddamani, S.S.; Bhowmik, B.
    In an era of rapid data expansion, big data has significantly transformed various industries, redefining the processes of data processing, analysis, and utilization. The widespread adoption of digital technologies has driven this surge in big data, leading to an unprecedented accumulation of information from sources such as social media, sensors, and transactions. As big data evolves, it presents significant challenges and unique opportunities, necessitating innovative solutions to leverage its potential fully. One critical challenge in big data environments is accurately predicting job runtimes, essential for optimizing resource utilization and enhancing overall system performance. Current approaches, including analytical models and machine learning algorithms, often need help to manage the complexities of unstructured data and maintain interpretability effectively. This paper proposes a novel hybrid modeling approach that integrates the strengths of both techniques to improve job runtime predictions. The hybrid architecture combines an analytical model, which captures the intricate characteristics of jobs and execution environments, with a machine learning model trained to detect patterns and relationships in historical data. As demonstrated on real-world big datasets, the hybrid model achieves greater accuracy by merging these capabilities. Utilizing the flexible capabilities of PySpark and incorporating advanced feature engineering techniques, the model dynamically adapts to various dataset sizes and complexities, ensuring robust performance across different scenarios. © 2024 IEEE.
  • Item
    Optimizing Feature Selection in Big Data: A Hybrid Spark and Fuzzy Approach
    (Institute of Electrical and Electronics Engineers Inc., 2024) Hada, A.S.; Sahoo, G.S.; Vamsi, C.K.; Hegde, A.; Bhowmik, B.
    The exponential growth of big data presents both immense opportunities and significant challenges. While vast datasets hold the key to unlocking groundbreaking insights, efficiently extracting value requires sophisticated feature selection techniques. Traditional methods often struggle with the sheer volume and complexity of big data. This paper addresses this challenge by proposing a novel hybrid feature selection algorithm by leveraging Apache PySpark's distributed computing power. Combining a robust feature selection technique with a novel weighting scheme, our method outperforms existing hypercuboid and fuzzy Rough Set methods. The hybrid approach achieves superior accuracy of 72.1% with a reduced feature set, demonstrating its effectiveness in identifying salient features for big data analysis. © 2024 IEEE.
  • Item
    Outlier Detection in Streaming Data Using Deep Learning Models
    (Institute of Electrical and Electronics Engineers Inc., 2024) Dudipala, S.; Gangavarapu, S.; Girish, K.K.; Bhowmik, B.
    In the realm of the Internet of Things (IoT), devices continuously generate a vast and relentless stream of data, providing a real-time representation of digital landscape. The continuous and high-velocity nature of this streaming data poses significant challenges for real-time analysis. Accurate outlier detection within this data is essential, as such anomalies may indicate critical issues, attacks, or errors. Nevertheless, the dynamic and rapidly evolving characteristics of streaming data render traditional outlier detection methods inadequate. This paper investigates the application of Artificial Neural Networks (ANNs), specifically a Multi-Layer Perceptron (MLP), for outlier detection in streaming IoT data. The selection of the MLP from a range of Deep Neural Networks (DNNs) is based on its optimal balance between computational efficiency and model complexity. The model's efficacy is confirmed through rigorous experimentation, demonstrating strong performance across diverse scenarios and data classes. The MLP achieved an accuracy of 99.4%, underscoring its ability to detect even minor deviations from expected patterns. This high level of accuracy establishes the MLP as a robust tool for outlier detection in dynamic IoT environments. © 2024 IEEE.