Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
7 results
Search Results
Item Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU(John Wiley and Sons Ltd, 2015) Bayyapu, B.; Guddeti, R.M.R.; Raghavendra, P.S.General purpose computation on graphics processing unit (GPU) is rapidly entering into various scientific and engineering fields. Many applications are being ported onto GPUs for better performance. Various optimizations, frameworks, and tools are being developed for effective programming of GPU. As part of communication and computation optimizations for GPUs, this paper proposes and implements an optimization method called as kernel coalesce that further enhances GPU performance and also optimizes CPU to GPU communication time. With kernel coalesce methods, proposed in this paper, the kernel launch overheads are reduced by coalescing the concurrent kernels and data transfers are reduced incase of intermediate data generated and used among kernels. Computation optimization on a device (GPU) is performed by optimizing the number of blocks and threads launched by tuning it to the architecture. Block level kernel coalesce method resulted in prominent performance improvement on a device without the support for concurrent kernels. Thread level kernel coalesce method is better than block level kernel coalesce method when the design of a grid structure (i.e., number of blocks and threads) is not optimal to the device architecture that leads to underutilization of the device resources. Both the methods perform similar when the number of threads per block is approximately the same in different kernels, and the total number of threads across blocks fills the streaming multiprocessor (SM) capacity of the device. Thread multi-clock cycle coalesce method can be chosen if the programmer wants to coalesce more than two concurrent kernels that together or individually exceed the thread capacity of the device. If the kernels have light weight thread computations, multi clock cycle kernel coalesce method gives better performance than thread and block level kernel coalesce methods. If the kernels to be coalesced are a combination of compute intensive and memory intensive kernels, warp interleaving gives higher device occupancy and improves the performance. Multi clock cycle kernel coalesce method for micro-benchmark1 considered in this paper resulted in 10-40% and 80-92% improvement compared with separate kernel launch, without and with shared input and intermediate data among the kernels, respectively, on a Fermi architecture device, that is, GTX 470. A nearest neighbor (NN) kernel from Rodinia benchmark is coalesced to itself using thread level kernel coalesce method and warp interleaving giving 131.9% and 152.3% improvement compared with separate kernel launch and 39.5% and 36.8% improvement compared with block level kernel coalesce method, respectively. © 2014 John Wiley & Sons, Ltd.Item Parallel iterative hill climbing algorithm to solve TSP on GPU(John Wiley and Sons Ltd, 2019) Yelmewad, P.; Talawar, B.Traveling Salesman Problem (TSP) is an NP-hard combinatorial optimization problem. Heuristic algorithms provide satisfactory solutions to large instance TSP in a reasonable amount of time. However, heuristic methods result in suboptimal solutions as they do not cover the search space adequately. Sequential heuristic approaches spend significant CPU time in neighborhood generation for large input instances. Neighborhood generation time can be reduced by generating in parallel. GPUs have been shown to be effective in exploiting data and memory level parallelism in large complex problems. This work presents a GPU-based Parallel Iterative Hill Climbing (PIHC) algorithm using the nearest neighborhood heuristic to arrive at near-optimal solutions of large TSPLIB instances in a reasonable amount of time. Multiple construction heuristics approaches, thread mapping strategies, and data structures for TSPLIB instances have been evaluated. We demonstrate improved cost quality on symmetric TSPLIB instances up to 85,900 cities. The PIHC GPU implementation gives up to 193× speedup over its sequential counterpart and up to 979.96× speedup over a state-of-the-art GPU-based TSP solver. The PIHC implementation gives a cost quality with error rate 0.72% in the best case and 8.06% in the worst case. © 2018 John Wiley & Sons, Ltd.Item GPU-aware resource management in heterogeneous cloud data centers(Springer, 2021) Kulkarni, A.K.; Annappa, B.The power of rapid scalability and easy maintainability of cloud services is driving many high-performance computing applications from company server racks into cloud data centers. With the evolution of Graphics Processing Units, composing of an extensive array of parallel computing single-instruction-multiple-data processors are being considered as a platform for high-performance computing because of their high throughput. Many cloud providers have begun offering GPU-enabled services for the users where GPUs are essential (for high computational power) to meet the desired Quality-of-service. Virtual machine placement and load balancing the GPUs in the virtualized environments like the cloud is still an evolving area of research and it is of prime importance to achieve higher resource efficiency and also to save energy. The current VM placement techniques do not consider the impact of VM workload type and GPU memory status on the VM placement decisions. This paper discusses the current issues with the First Fit policy of virtual machine placement used in VMWare Horizon and proposes a GPU-aware VM placement technique for GPU-enabled virtualized environments like cloud data centers. The experiments conducted using the synthetic workloads indicate reduction in the energy consumption, reduction in search space of physical hosts, and the makespan of the system. It also presents a summary of the current challenges for GPU resource management in virtualized environments and specific issues in developing cloud applications targeting GPUs under the virtualization layer. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item GPGPU-based randomized visual secret sharing (GRVSS) for grayscale and colour images(Taylor and Francis Ltd., 2022) Holla, R.; Mhala, N.C.; Pais, A.R.Visual Secret Sharing (VSS) is a technique used for sharing secret images between users. The existing VSS schemes reconstruct the original secret image as a halftone image with only a 50% contrast. The Randomized Visual Secret Sharing (RVSS) scheme overcomes the disadvantages of existing VSS schemes. Although RVSS extracts the secret image with better contrast, it is computationally expensive. This paper proposes a General Purpose Graphics Processing Unit (GPGPU)-based Randomized Visual Secret Sharing (GRVSS) technique that leverages data parallelism in the RVSS pipeline. The performance of the GRVSS is compared with the RVSS in a generic and PARAM Shavak supercomputer architecture. The GRVSS outperforms the RVSS in both architectures. © 2020 Informa UK Limited, trading as Taylor & Francis Group.Item An Effective GPGPU Visual Secret Sharing by Contrast-Adaptive ConvNet Super-Resolution(Springer, 2022) Holla, M.R.; Pais, A.R.In this paper, we propose an effective secret image sharing model with super-resolution utilizing a Contrast-adaptive Convolution Neural Network (CCNN or CConvNet). The two stages of this model are the share generation and secret image reconstruction. The share generation step generates information embedded shadows (shares) equal to the number of participants. The activities involved in the share generation are to create a halftone image, create shadows, and transforming the image to the wavelet domain using Discrete Wavelet Transformation (DWT) to embed information into the shadows. The reconstruction stage is the inverse of the share generation supplemented with CCNN to improve the reconstructed image’s quality. This work is significant as it exploits the computational power of the General-Purpose Graphics Processing Unit (GPGPU) to perform the operations. The extensive use of memory optimization using GPGPU-constant memory in all the activities brings uniqueness and efficiency to the proposed model. The contrast-adaptive normalization between the CCNN layers in improving the quality during super-resolution impart novelty to our investigation. The objective quality assessment proved that the proposed model produces a high-quality reconstructed image with the SSIM of (89 - 99.8 %) for the noise-like shares and (71.6 - 90 %) for the meaningful shares. The proposed technique achieved a speedup of 800 × in comparison with the sequential model. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item High-performance medical image secret sharing using super-resolution for CAD systems(Springer, 2022) Holla, M.R.; Pais, A.R.Visual Secret Sharing (VSS) is a field of Visual Cryptography (VC) in which the secret image (SI) is distributed to a certain number of participants in the form of different encrypted shares. The decryption then uses authorized shares in a pre-defined manner to obtain that secret information. Medical image secret sharing (MISS) is an emerging VSS field to address the performance challenges in sharing medical images, such as efficiency and effectiveness. Here, we propose a novel MISS for the histopathological medical images to achieve high performance in these two parameters. The novelty here is the Graphics Processing Unit (GPU) to exploit the data-parallelism in MISS during encryption and super-resolution (SR), supplementing effectiveness with efficiency. A Convolution Neural Network (CNN) for SR produces a high-contrast reconstructed image. We evaluate the presented model using standard objective assessment parameters and the Computer-Aided Diagnosis (CAD) systems. The result analysis confirmed the high-performance of the proposed MISS with a 98% SSIM of the deciphered image. Compared with the state-of-art deep learning models designed for the histopathological medical images, MISS outperformed with 99.71% accuracy. Also, we achieved a categorization precision that fits the CAD systems. We attained an overall speedup of 800 × over the sequential model. This speedup is significant compared to the speedups of the benchmark GPGPU-based medical image reconstruction models. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Accelerating randomized image secret sharing with GPU: contrast enhancement and secure reconstruction using progressive and convolutional approaches(Springer, 2024) Holla, M.; Suma, D.; Pais, A.R.Image Secret Sharing (ISS) is a cryptographic technique used to distribute secret images among multiple users. However, current Visual Secret Sharing (VSS) schemes produce a halftone image with only 50% contrast when reconstructing the original image. To overcome this limitation, the Randomized Image Secret Sharing (RISS) scheme was introduced. RISS achieves a higher contrast of 70% when extracting the secret image but comes with a high computational cost. This research paper presents a novel approach called Graphics Processing Unit (GPU)-based Randomized Image Secret Sharing (GRISS), which utilizes data parallelism within the RISS pipeline. The proposed technique also incorporates an Autoencoder-based Single Image Super-Resolution (ASISR) to enhance the contrast of the recovered image. The performance of GRISS is evaluated against RISS, and the contrast of the ASISR images is compared to current benchmark models. The results demonstrate that GRISS outperforms state-of-the-art models in both efficiency and effectiveness. © The Author(s) 2024.
