Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 8 of 8

GPU accelerated inexact matching for multiple patterns in DNA sequences
(Institute of Electrical and Electronics Engineers Inc., 2014) Rastogi, P.; Guddeti, G.R.M.
DNA sequencing technology generates millions of patterns on Every run of the machine and it poses a challenge for matching these patterns to the reference genome effectively with high execution speed. The main idea here is inexact matching of patterns with mismatches and gaps (insertions and deletions). In Inexact match up pattern DNA sequence is to be matched with some allowed number of errors. Here we have considered 2 errors. Errors can be mismatches or gaps. Existing algorithm as SOAP3 performs inexact matching on GPU with mismatches only. SOAP3 doesn't consider gaps (insertion and deletion). General Purpose Graphical Processing Unit (GPGPU) is an effective solution in terms of the cost and speed and there by providing a high degree of parallelism. This paper presents a parallel implementation of multiple pattern inexact matching in genome reference using CUDA based on BWT. The algorithm incorporates DFS (Depth First Search) Strategy for For matching multiple patterns, each thread of GPGPU is provided with a different pattern and hence millions of patterns can be matched using only one CUDA kernel. Since the memory of the GPU is limited then memory management should handled carefully. Synchronization of multiple threads is provided in order to prevent illegal access to the shared memory. GPU results are compared with that of CPU execution Experimental results of the proposed methodology achieved an average speedup factor of seven as compared to that of CPU execution. Â© 2014 IEEE.
Comparative Analysis of TensorFlow and PyTorch for Image Classification Using CNN with Parallelism Strategies
(Institute of Electrical and Electronics Engineers Inc., 2024) Nandan, A.D.M.
With an emphasis on convolutional neural networks (CNNs), this research does a thorough analysis of the effectiveness and suitability of the TensorFlow and PyTorch frameworks for image classification tasks. The first objective compares runtime efficiency and computing capacity in detail, emphasising the effects of data parallelism techniques. The focus of second objective is to assess the accuracy and robustness of the model, specifically using the CIFAR-10 and CIFAR-100 dataset. The study intends to give decision-makers in the deep learning and image classification fields insightful information on the real-world consequences of choosing TensorFlow or PyTorch. The comparative analysis helps researchers and practitioners alike make well-informed decisions by fostering a deeper awareness of the advantages and disadvantages of various frameworks. Â© 2024 IEEE.
Dynamic Checkpointing: Fault Tolerance in High-Performance Computing
(Institute of Electrical and Electronics Engineers Inc., 2024) Bhowmik, B.; Verma, T.; Dineshbhai, N.D.; Reddy, M.R.V.; Girish, K.K.
Parallel computing has become a cornerstone of modern computational systems, enabling the rapid processing of complex tasks by utilizing multiple processors simultaneously. However, the efficiency and reliability of these systems can be significantly compromised by inherent challenges such as hardware failures, communication delays, and uneven workload distribution. These issues not only slow down computations but also threaten the dependability of applications reliant on parallel processing. To address these challenges, researchers have developed strategies like dynamic checkpointing and load balancing, which are crucial for enhancing fault tolerance and optimizing performance. Dynamic checkpointing periodically saves the computational state, allowing for recovery from failures without significant data loss, while load balancing ensures that tasks are evenly distributed across processors, preventing bottlenecks and underutilization of resources. By integrating these mechanisms, this paper proposes a robust framework that improves the reliability and efficiency of parallel systems, particularly in high-performance computing environments where the ability to handle large-scale data processing with minimal downtime is critical. Â© 2024 IEEE.
Optimizing Performance of OpenMP Parallel Applications through Variable Classification
(Institute of Electrical and Electronics Engineers Inc., 2024) Kumar, S.; Talib, M.
OpenMP provides a versatile framework for parallel computing, allowing developers to transform sequential programs into parallel applications for shared-memory architectures efficiently. One of the central challenges in this transformation lies in accurately identifying appropriate parallel constructs and clauses, which are critical for maximizing performance and ensuring the correctness of the resulting parallel code. A particularly intricate aspect of this process is the classification of variables according to their data-sharing semantics, including first-private, private, last-private, shared, and reduction clauses. Manual classification is laborintensive and significantly susceptible to errors as the program's scale and complexity grow. Although various tools have been developed to assist with variable classification, they often rely on extensive data-dependence analyses and rigid classification schemes, limiting their effectiveness when applied to large-scale programs with complex scoping requirements. This paper presents a novel, cost-effective approach to automate and enhance the accuracy of variable classification in OpenMP parallelization. By reducing the manual effort required and improving the precision of parallel construct insertion, this approach aims to significantly optimize the performance of parallel applications, thereby advancing the utility and accessibility of OpenMP for a wide range of computational tasks. Â© 2024 IEEE.
Enhancing MPI Communication Efficiency for Grid-Based Stencil Computations
(Institute of Electrical and Electronics Engineers Inc., 2024) Goudar, S.I.; Nayaka, P.S.J.; Girish, K.K.; Bhowmik, B.
In parallel computing, where efficiency and speed are crucial, the Message Passing Interface (MPI) is a fundamental paradigm for managing large-scale distributed memory systems. MPI is critical to complex computational tasks, particularly in grid-based computations that solve intricate numerical problems by discretizing spatial domains into structured grids. However, MPI Cartesian communicators exhibit limitations in handling these computations effectively, especially when managing large-scale data exchanges and complex stencil patterns. This paper addresses these challenges by presenting an integrated approach that combines MPI collective and Cartesian communication methods. The proposed solution simplifies data distribution, eliminates redundant interfaces, and enhances communication efficiency. Experimental results show a 43% reduction in execution time and a 40% decrease in communication overhead, with scalability improvements achieving 12.5x speedup using 64 processes. These quantitative outcomes demonstrate the advan-tages of the proposed method over conventional MPI Cartesian approaches, establishing it as a reliable framework for advancing High-Performance Computing (HPC) capabilities in grid-based applications. Â© 2024 IEEE.
Performance Analysis and Predictive Modeling of MPI Collective Algorithms in Multi-Core Clusters: A Comparative Study
(Institute of Electrical and Electronics Engineers Inc., 2025) Reddy, M.R.V.S.R.S.; Raju, S.R.; Girish, K.K.; Bhowmik, B.
Efficient communication is the foundation of parallel computing systems, enabling seamless coordination across multiple processors for optimal performance. At the core of this communication lies the Message Passing Interface, a crucial framework designed to facilitate data exchange between processors through collective operations. However, these MPI operations often face challenges, including fluctuating process counts, varying message sizes, and increased communication overhead. These issues can significantly impact execution times and scalability, leading to potential bottlenecks in large-scale systems. To address these concerns, this paper provides an in-depth evaluation of key MPI collective algorithms - Flat Tree, Chain, and Binary Tree - by examining their performance under varying configurations. By analyzing execution times and communication overhead, the study reveals the trade-offs inherent in each algorithm, offering insights into strategies for reducing communication costs. Through this analysis, we aim to provide valuable guidance to improve the efficiency and scalability of parallel computing, particularly in high-performance systems where communication efficiency is critical. Â© 2025 IEEE.
Taskgraph Framework: A Competitive Alternative to the OpenMP Thread Model
(Institute of Electrical and Electronics Engineers Inc., 2025) Chavan, S.; Nile, P.; Kumar, S.; Bhowmik, B.
OpenMP is the predominant standard for shared memory systems in high-performance computing (HPC), offering a tasking paradigm for parallelism. However, existing OpenMP implementations, like GCC and LLVM, face computational limitations that hinder performance, especially for large-scale tasks. This paper presents the Taskgraph framework, a novel solution that overcomes the limitations of traditional task dependency graphs (TDGs). Unlike conventional TDGs, which require costly reconstruction for dynamic program structures, the Taskgraph framework uses a taskgraph clause with a list of variables, enabling real-time adaptation without complete reconstruction. This approach significantly reduces overhead, making the Task-graph model highly efficient for tasks with minimal dependencies, offering a competitive alternative to the OpenMP thread model, and enhancing efficiency and adaptability in dynamic HPC environments. Â© 2025 IEEE.
Efficient Parallel Algorithm for Detecting Longest Flow Paths in Flow Direction Grids
(Institute of Electrical and Electronics Engineers Inc., 2025) Jayarukshi, K.; Agarwal, S.; Girish, K.K.; Goudar, S.; Bhowmik, B.
High-performance computing (HPC) has transformed the capacity to address complex computational tasks across various scientific fields by enabling the efficient processing of large datasets and intricate simulations. In hydrological modeling, a critical task is identifying the longest flow channel within a catchment, which is essential for understanding water flow patterns and managing resources. However, existing geographic information system (GIS) algorithms for flow path identification often suffer from inefficiencies and inaccuracies. To address these challenges, this paper introduces innovative parallel methods utilizing Open Multi-Processing (OpenMP), a widely-used API that supports multi-platform shared-memory parallel programming. This approach optimizes the analysis of flow direction data, resulting in faster and more accurate identification of flow channels. The results demonstrate that the proposed method outperforms current approaches, offering substantial improvements in both performance and precision. These advancements have the potential to significantly enhance hydrological modeling practices and water resource management. Â© 2025 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results