Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
9 results
Search Results
Item Parallel OpenMP and CUDA Implementations of the N-Body Problem(Springer Verlag service@springer.de, 2019) Gangavarapu, T.; Pal, H.; Prakash, P.; Hegde, S.; Geetha, V.The N-body problem, in the field of astrophysics, predicts the movements of the planets and their gravitational interactions. This paper aims at developing efficient and high-performance implementations of two versions of the N-body problem. Adaptive tree structures are widely used in N-body simulations. Building and storing the tree and the need for work-load balancing pose significant challenges in high-performance implementations. Our implementations use various cores in CPU and GPU via efficient work-load balancing with data and task parallelization. The contributions include OpenMP and Nvidia CUDA implementations to parallelize force computation and mass distribution, and achieve competitive performance in terms of speedup and running time which is empirically justified and graphed. This research not only aids as an alternative to complex simulations but also to other big data applications requiring work-load distribution and computationally expensive procedures. © 2019, Springer Nature Switzerland AG.Item VaFLE: Value flag length encoding for images in a multithreaded environment(Springer, 2019) Kinnal, B.; Pasupulety, U.; Geetha, V.The Run Length Encoding (RLE) algorithm substitutes long runs of identical symbols with the value of that symbol followed by the binary representation of the frequency of occurrences of that value. This lossless technique is effective for encoding images where many consecutive pixels have similar intensity values. One of the major problems of RLE for encoding runs of bits is that the encoded runs have their lengths represented as a fixed number of bits in order to simplify decoding. The number of bits assigned is equal to the number required to encode the maximum length run, which results in the addition of padding bits on runs whose lengths do not require as many bits for representation as the maximum length run. Due to this, the encoded output sometimes exceeds the size of the original input, especially for input data where in the runs can have a wide range of sizes. In this paper, we propose VaFLE, a general-purpose lossless data compression algorithm, where the number of bits allocated for representing the length of a given run is a function of the length of the run itself. The total size of an encoded run is independent of the maximum run length of the input data. In order to exploit the inherent data parallelism of RLE, VaFLE was also implemented in a multithreaded OpenMP environment. Our algorithm guarantees better compression rates of upto 3X more than standard RLE. The parallelized algorithm attains a speedup as high as 5X in grayscale and 4X in color images compared to the RLE approach. © Springer Nature Singapore Pte Ltd 2019.Item Optimizing Performance of OpenMP Parallel Applications through Variable Classification(Institute of Electrical and Electronics Engineers Inc., 2024) Kumar, S.; Talib, M.OpenMP provides a versatile framework for parallel computing, allowing developers to transform sequential programs into parallel applications for shared-memory architectures efficiently. One of the central challenges in this transformation lies in accurately identifying appropriate parallel constructs and clauses, which are critical for maximizing performance and ensuring the correctness of the resulting parallel code. A particularly intricate aspect of this process is the classification of variables according to their data-sharing semantics, including first-private, private, last-private, shared, and reduction clauses. Manual classification is laborintensive and significantly susceptible to errors as the program's scale and complexity grow. Although various tools have been developed to assist with variable classification, they often rely on extensive data-dependence analyses and rigid classification schemes, limiting their effectiveness when applied to large-scale programs with complex scoping requirements. This paper presents a novel, cost-effective approach to automate and enhance the accuracy of variable classification in OpenMP parallelization. By reducing the manual effort required and improving the precision of parallel construct insertion, this approach aims to significantly optimize the performance of parallel applications, thereby advancing the utility and accessibility of OpenMP for a wide range of computational tasks. © 2024 IEEE.Item Optimizing Split Algorithm Performance: A Heuristic Method for Enhanced Tensor Product Matrix Computations(Institute of Electrical and Electronics Engineers Inc., 2024) Bhowmik, B.; Kumar, S.; Raju, S.R.; Prakash, A.; Mense, O.Optimizing tensor product matrix computations is critical for enhancing computational efficiency in high-performance applications. Traditional algorithms, like the Split algorithm, often struggle due to the unique properties of each matrix involved. This paper presents a novel heuristic method that optimizes the selection of cutting points and matrix ar-rangement, significantly reducing redundant calculations and minimizing memory usage. The proposed approach adapts to the varying characteristics of tensor products, improving performance across different computational scenarios. Enhancing floating-point operation efficiency and CPU utilization delivers substantial speed and efficiency gains, particularly in large-scale tensor product matrix operations, offering a robust solution for complex computational tasks. © 2024 IEEE.Item An Integrated MPI and OpenMP Approach for Plasma Dynamics Simulations(Institute of Electrical and Electronics Engineers Inc., 2024) Prakash, Y.M.; Girish, K.K.; Verma, L.; Kumar, S.; Bhowmik, B.Plasma dynamics is the behavior exhibited by two or more charged species with respect to electric or magnetic fields. In high-performance computing (HPC) applications, it requires all these factors: the accuracy of parallel implementations, effective inter-process communication, and scalability with respect to workload. This paper points out the limitations in the current approaches to the plasma dynamics problems, and discusses the use of MPI continuation tasks and of its performance enhancement with OpenMP methods. Within the framework of the Vlasov-Poisson system, we develop theory of MPI continuation and describe techniques optimal for its use, which allows to efficiently combine communication with computation, which is quite a difficult task in most of the cases, especially in the case of multidimensional simulations. The results allow better insights on how to increase the level of parallelism and reduce the time to compute, which in turn fosters the formulation of more effective high-performance strategies and also the understanding of the parallelism in plasma simulations using the MPI standard. © 2024 IEEE.Item Optimizing Data Movement in Heterogeneous Computing: A LASSA-based Approach for Efficient Nucleation List Precomputation(Institute of Electrical and Electronics Engineers Inc., 2025) Bhowmik, B.; Girish, K.K.; Pandey, H.; Prabhanjans, P.In the rapidly evolving landscape of heterogeneous computing, the efficiency of data movement between CPUs and GPUs can make or break system performance. Despite advancements in parallel processing, existing methods for managing data transfers - particularly in GPU offloading scenarios - suffer from significant inefficiencies. These inefficiencies are particularly evident in nucleation list precomputation for non-equilibrium solidification models, where redundant data movements and complex dynamic work-sharing in OpenMP lead to significant performance overhead. To tackle this issue, this paper proposes a novel solution that integrates the Location-Aware Heap Static Single Assignment (LASSA) algorithm into the compilation process. This approach identifies and eliminates redundant memory copy operations, optimizing data transfers and reducing overhead. The findings reveal a dramatic performance boost, with up to a 9.6-fold increase in efficiency. By addressing the specific challenges of nucleation list precomputation, this work provides valuable insights into optimizing data movement in heterogeneous computing environments, paving the way for enhanced performance in parallel programming models. © 2025 IEEE.Item Taskgraph Framework: A Competitive Alternative to the OpenMP Thread Model(Institute of Electrical and Electronics Engineers Inc., 2025) Chavan, S.; Nile, P.; Kumar, S.; Bhowmik, B.OpenMP is the predominant standard for shared memory systems in high-performance computing (HPC), offering a tasking paradigm for parallelism. However, existing OpenMP implementations, like GCC and LLVM, face computational limitations that hinder performance, especially for large-scale tasks. This paper presents the Taskgraph framework, a novel solution that overcomes the limitations of traditional task dependency graphs (TDGs). Unlike conventional TDGs, which require costly reconstruction for dynamic program structures, the Taskgraph framework uses a taskgraph clause with a list of variables, enabling real-time adaptation without complete reconstruction. This approach significantly reduces overhead, making the Task-graph model highly efficient for tasks with minimal dependencies, offering a competitive alternative to the OpenMP thread model, and enhancing efficiency and adaptability in dynamic HPC environments. © 2025 IEEE.Item Efficient Parallel Algorithm for Detecting Longest Flow Paths in Flow Direction Grids(Institute of Electrical and Electronics Engineers Inc., 2025) Jayarukshi, K.; Agarwal, S.; Girish, K.K.; Goudar, S.; Bhowmik, B.High-performance computing (HPC) has transformed the capacity to address complex computational tasks across various scientific fields by enabling the efficient processing of large datasets and intricate simulations. In hydrological modeling, a critical task is identifying the longest flow channel within a catchment, which is essential for understanding water flow patterns and managing resources. However, existing geographic information system (GIS) algorithms for flow path identification often suffer from inefficiencies and inaccuracies. To address these challenges, this paper introduces innovative parallel methods utilizing Open Multi-Processing (OpenMP), a widely-used API that supports multi-platform shared-memory parallel programming. This approach optimizes the analysis of flow direction data, resulting in faster and more accurate identification of flow channels. The results demonstrate that the proposed method outperforms current approaches, offering substantial improvements in both performance and precision. These advancements have the potential to significantly enhance hydrological modeling practices and water resource management. © 2025 IEEE.Item Exploring Hidden Behaviors in OpenMP Multi-threaded Applications for Anomaly Detection in HPC Environments(Springer Science and Business Media Deutschland GmbH, 2025) Bhowmik, B.; Girish, K.K.; Mishra, P.; Mishra, R.In high-performance computing (HPC), multi-threaded applications using OpenMP face complex challenges in identifying hidden performance issues, often due to resource conflicts, software inefficiencies, and hardware anomalies. These subtle issues can significantly degrade performance and reduce system reliability. This paper introduces an innovative approach designed to address these concealed issues in OpenMP multi-threaded applications. The proposed method integrates a Random Forest classifier with anthropomorphic diagnosis to effectively identify and diagnose performance-affecting problems. The approach has demonstrated a remarkable ability to detect 90% of performance-affecting issues that are often obscured within complex HPC environments. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
