Performance prediction of data streams on high-performance architecture

dc.contributor.authorGautam, B.
dc.contributor.authorAnnappa, A.
dc.date.accessioned2026-02-05T09:29:24Z
dc.date.issued2019
dc.description.abstractWorldwide sensor streams are expanding continuously with unbounded velocity in volume, and for this acceleration, there is an adaptation of large stream data processing system from the homogeneous to rack-scale architecture which makes serious concern in the domain of workload optimization, scheduling, and resource management algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defined domain for dynamic data stream metrics including a self-driven model which tries to fit these metrics using ridge regularization regression algorithm. Another significant contribution lies in fully-automated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of five domain-specific topologies. To assess the proposed methodologies, we forcefully ingest tuple skewness among the benchmarking topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%. © 2019, The Author(s).
dc.identifier.citationHuman-centric Computing and Information Sciences, 2019, 9, 1, pp. -
dc.identifier.urihttps://doi.org/10.1186/s13673-018-0163-4
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/24279
dc.publisherSpringer Berlin Heidelberg
dc.subjectBenchmarking
dc.subjectClustering algorithms
dc.subjectComputer architecture
dc.subjectDimensionality reduction
dc.subjectDistributed parameter control systems
dc.subjectForecasting
dc.subjectInformation management
dc.subjectReduction
dc.subjectRegression analysis
dc.subjectScheduling
dc.subjectTopology
dc.subjectApache Heron
dc.subjectBenchmark suites
dc.subjectClustering
dc.subjectHigh performance computing
dc.subjectPerformance behavior
dc.subjectPerformance prediction
dc.subjectRegression
dc.subjectData streams
dc.titlePerformance prediction of data streams on high-performance architecture

Files

Collections