Conference Papers

Search Results

Now showing 1 - 2 of 2

Dynamic Checkpointing: Fault Tolerance in High-Performance Computing
(Institute of Electrical and Electronics Engineers Inc., 2024) Bhowmik, B.; Verma, T.; Dineshbhai, N.D.; Reddy, M.R.V.; Girish, K.K.
Parallel computing has become a cornerstone of modern computational systems, enabling the rapid processing of complex tasks by utilizing multiple processors simultaneously. However, the efficiency and reliability of these systems can be significantly compromised by inherent challenges such as hardware failures, communication delays, and uneven workload distribution. These issues not only slow down computations but also threaten the dependability of applications reliant on parallel processing. To address these challenges, researchers have developed strategies like dynamic checkpointing and load balancing, which are crucial for enhancing fault tolerance and optimizing performance. Dynamic checkpointing periodically saves the computational state, allowing for recovery from failures without significant data loss, while load balancing ensures that tasks are evenly distributed across processors, preventing bottlenecks and underutilization of resources. By integrating these mechanisms, this paper proposes a robust framework that improves the reliability and efficiency of parallel systems, particularly in high-performance computing environments where the ability to handle large-scale data processing with minimal downtime is critical. Â© 2024 IEEE.
Enhancing MPI Communication Efficiency for Grid-Based Stencil Computations
(Institute of Electrical and Electronics Engineers Inc., 2024) Goudar, S.I.; Nayaka, P.S.J.; Girish, K.K.; Bhowmik, B.
In parallel computing, where efficiency and speed are crucial, the Message Passing Interface (MPI) is a fundamental paradigm for managing large-scale distributed memory systems. MPI is critical to complex computational tasks, particularly in grid-based computations that solve intricate numerical problems by discretizing spatial domains into structured grids. However, MPI Cartesian communicators exhibit limitations in handling these computations effectively, especially when managing large-scale data exchanges and complex stencil patterns. This paper addresses these challenges by presenting an integrated approach that combines MPI collective and Cartesian communication methods. The proposed solution simplifies data distribution, eliminates redundant interfaces, and enhances communication efficiency. Experimental results show a 43% reduction in execution time and a 40% decrease in communication overhead, with scalability improvements achieving 12.5x speedup using 64 processes. These quantitative outcomes demonstrate the advan-tages of the proposed method over conventional MPI Cartesian approaches, establishing it as a reliable framework for advancing High-Performance Computing (HPC) capabilities in grid-based applications. Â© 2024 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results