Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 8 of 8

ARTSAM: Augmented Reality App for Tool Selection in Aircraft Maintenance
(Springer Science and Business Media Deutschland GmbH, 2023) Satish, N.; Kumar, C.R.S.
Aircraft Maintenance is an advanced task requiring highly skilled engineers. Facilitating the Aircraft maintenance by providing proper tools and equipment is essential in ensuring good maintenance work. Aircraft Maintenance Technicians (AMTs) require precise knowledge and customized tools to perform their duties. They are responsible for an airplaneâ€™s safety and efficiency, and rely on a few basic pieces of equipment for a wide range of jobs pertaining to airplane maintenance. Specific maintenance tasks require unique tools. And while the AMTs could probably improvise and get the job done anyway, specialized tools exist for a reasonâ€“they help get the job done correctly and improvising will lead to unnecessary labor and a compromised aircraft. For example, an incorrectly sized screwdriver or screw causes wear and tear and makes the job harder. Besides, traditional tool management requires employees to manually check in and out each tool, which is time consuming. A Tool Selector app which recognises and tags tools in real time will help AMTs in determining how it is used in a particular task. Through this app, the AMTs can be guided through animations to perform specific tasks, such as replacement of Oil Filter from an aircraft engine. Â© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
New sparse matrix storage format to improve the performance of total SPMV time
(2012) Bayyapu, B.; Raghavendra, S.R.; Guddeti, G.
Graphics Processing Units (GPUs) are massive data parallel processors. High performance comes only at the cost of identifying data parallelism in the applications while using data parallel processors like GPU. This is an easy effort for applications that have regular memory access and high computation intensity. GPUs are equally attractive for sparse matrix vector multiplications (SPMV for short) that have irregular memory access. SPMV is an important computation in most of the scientific and engineering applications and scaling the performance, bandwidth utilization and compute intensity (ratio of computation to the data access) of SPMV computation is a priority in both academia and industry. There are various data structures and access patterns proposed for sparse matrix representation on GPUs and optimizations and improvements on these data structures is a continuous effort. This paper proposes a new format for the sparse matrix representation that reduces the data organization time and the memory transfer time from CPU to GPU for the memory bound SPMV computation. The BLSI (Bit Level Single Indexing) sparse matrix representation is up to 204% faster than COO (Co-ordinate), 104% faster than CSR (Compressed Sparse Row) and 217% faster than HYB (Hybrid) formats in memory transfer time from CPU to GPU. The proposed sparse matrix format is implemented in CUDA-C on CUDA (Compute Unified Device Architecture) supported NVIDIA graphics cards. © 2012 SCPE.
Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU
(John Wiley and Sons Ltd, 2015) Bayyapu, B.; Guddeti, R.M.R.; Raghavendra, P.S.
General purpose computation on graphics processing unit (GPU) is rapidly entering into various scientific and engineering fields. Many applications are being ported onto GPUs for better performance. Various optimizations, frameworks, and tools are being developed for effective programming of GPU. As part of communication and computation optimizations for GPUs, this paper proposes and implements an optimization method called as kernel coalesce that further enhances GPU performance and also optimizes CPU to GPU communication time. With kernel coalesce methods, proposed in this paper, the kernel launch overheads are reduced by coalescing the concurrent kernels and data transfers are reduced incase of intermediate data generated and used among kernels. Computation optimization on a device (GPU) is performed by optimizing the number of blocks and threads launched by tuning it to the architecture. Block level kernel coalesce method resulted in prominent performance improvement on a device without the support for concurrent kernels. Thread level kernel coalesce method is better than block level kernel coalesce method when the design of a grid structure (i.e., number of blocks and threads) is not optimal to the device architecture that leads to underutilization of the device resources. Both the methods perform similar when the number of threads per block is approximately the same in different kernels, and the total number of threads across blocks fills the streaming multiprocessor (SM) capacity of the device. Thread multi-clock cycle coalesce method can be chosen if the programmer wants to coalesce more than two concurrent kernels that together or individually exceed the thread capacity of the device. If the kernels have light weight thread computations, multi clock cycle kernel coalesce method gives better performance than thread and block level kernel coalesce methods. If the kernels to be coalesced are a combination of compute intensive and memory intensive kernels, warp interleaving gives higher device occupancy and improves the performance. Multi clock cycle kernel coalesce method for micro-benchmark1 considered in this paper resulted in 10-40% and 80-92% improvement compared with separate kernel launch, without and with shared input and intermediate data among the kernels, respectively, on a Fermi architecture device, that is, GTX 470. A nearest neighbor (NN) kernel from Rodinia benchmark is coalesced to itself using thread level kernel coalesce method and warp interleaving giving 131.9% and 152.3% improvement compared with separate kernel launch and 39.5% and 36.8% improvement compared with block level kernel coalesce method, respectively. © 2014 John Wiley & Sons, Ltd.
L, r-Stitch Unit: Encoder-Decoder-CNN Based Image-Mosaicing Mechanism for Stitching Non-Homogeneous Image Sequences
(Institute of Electrical and Electronics Engineers Inc., 2021) Chilukuri, P.K.; Padala, P.; Padala, P.; Desanamukula, V.S.; Pvgd, P.R.
Image-stitching (or) mosaicing is considered an active research-topic with numerous use-cases in computer-vision, AR/VR, computer-graphics domains, but maintaining homogeneity among the input image sequences during the stitching/mosaicing process is considered as a primary-limitation major-disadvantage. To tackle these limitations, this article has introduced a robust and reliable image stitching methodology (l,r-Stitch Unit), which considers multiple non-homogeneous image sequences as input to generate a reliable panoramically stitched wide view as the final output. The l,r-Stitch Unit further consists of a pre-processing, post-processing sub-modules a l,r-PanoED-network, where each sub-module is a robust ensemble of several deep-learning, computer-vision image-handling techniques. This article has also introduced a novel convolutional-encoder-decoder deep-neural-network (l,r-PanoED-network) with a unique split-encoding-network methodology, to stitch non-coherent input left, right stereo image pairs. The encoder-network of the proposed l,r-PanoED extracts semantically rich deep-feature-maps from the input to stitch/map them into a wide-panoramic domain, the feature-extraction feature-mapping operations are performed simultaneously in the l,r-PanoED's encoder-network based on the split-encoding-network methodology. The decoder-network of l,r-PanoED adaptively reconstructs the output panoramic-view from the encoder networks' bottle-neck feature-maps. The proposed l,r-Stitch Unit has been rigorously benchmarked with alternative image-stitching methodologies on our custom-built traffic dataset and several other public-datasets. Multiple evaluation metrics (SSIM, PSNR, MSE, L_{\alpha,\beta,\gamma } , FM-rate, Average-latency-time) wild-Conditions (rotational/color/intensity variances, noise, etc) were considered during the benchmarking analysis, and based on the results, our proposed method has outperformed among other image-stitching methodologies and has proved to be effective even in wild non-homogeneous inputs. © 2013 IEEE.
GPU-aware resource management in heterogeneous cloud data centers
(Springer, 2021) Kulkarni, A.K.; Annappa, B.
The power of rapid scalability and easy maintainability of cloud services is driving many high-performance computing applications from company server racks into cloud data centers. With the evolution of Graphics Processing Units, composing of an extensive array of parallel computing single-instruction-multiple-data processors are being considered as a platform for high-performance computing because of their high throughput. Many cloud providers have begun offering GPU-enabled services for the users where GPUs are essential (for high computational power) to meet the desired Quality-of-service. Virtual machine placement and load balancing the GPUs in the virtualized environments like the cloud is still an evolving area of research and it is of prime importance to achieve higher resource efficiency and also to save energy. The current VM placement techniques do not consider the impact of VM workload type and GPU memory status on the VM placement decisions. This paper discusses the current issues with the First Fit policy of virtual machine placement used in VMWare Horizon and proposes a GPU-aware VM placement technique for GPU-enabled virtualized environments like cloud data centers. The experiments conducted using the synthetic workloads indicate reduction in the energy consumption, reduction in search space of physical hosts, and the makespan of the system. It also presents a summary of the current challenges for GPU resource management in virtualized environments and specific issues in developing cloud applications targeting GPUs under the virtualization layer. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
GPGPU-based randomized visual secret sharing (GRVSS) for grayscale and colour images
(Taylor and Francis Ltd., 2022) Holla, R.; Mhala, N.C.; Pais, A.R.
Visual Secret Sharing (VSS) is a technique used for sharing secret images between users. The existing VSS schemes reconstruct the original secret image as a halftone image with only a 50% contrast. The Randomized Visual Secret Sharing (RVSS) scheme overcomes the disadvantages of existing VSS schemes. Although RVSS extracts the secret image with better contrast, it is computationally expensive. This paper proposes a General Purpose Graphics Processing Unit (GPGPU)-based Randomized Visual Secret Sharing (GRVSS) technique that leverages data parallelism in the RVSS pipeline. The performance of the GRVSS is compared with the RVSS in a generic and PARAM Shavak supercomputer architecture. The GRVSS outperforms the RVSS in both architectures. © 2020 Informa UK Limited, trading as Taylor & Francis Group.
An Effective GPGPU Visual Secret Sharing by Contrast-Adaptive ConvNet Super-Resolution
(Springer, 2022) Holla, M.R.; Pais, A.R.
In this paper, we propose an effective secret image sharing model with super-resolution utilizing a Contrast-adaptive Convolution Neural Network (CCNN or CConvNet). The two stages of this model are the share generation and secret image reconstruction. The share generation step generates information embedded shadows (shares) equal to the number of participants. The activities involved in the share generation are to create a halftone image, create shadows, and transforming the image to the wavelet domain using Discrete Wavelet Transformation (DWT) to embed information into the shadows. The reconstruction stage is the inverse of the share generation supplemented with CCNN to improve the reconstructed image’s quality. This work is significant as it exploits the computational power of the General-Purpose Graphics Processing Unit (GPGPU) to perform the operations. The extensive use of memory optimization using GPGPU-constant memory in all the activities brings uniqueness and efficiency to the proposed model. The contrast-adaptive normalization between the CCNN layers in improving the quality during super-resolution impart novelty to our investigation. The objective quality assessment proved that the proposed model produces a high-quality reconstructed image with the SSIM of (89 - 99.8 %) for the noise-like shares and (71.6 - 90 %) for the meaningful shares. The proposed technique achieved a speedup of 800 × in comparison with the sequential model. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
High-performance medical image secret sharing using super-resolution for CAD systems
(Springer, 2022) Holla, M.R.; Pais, A.R.
Visual Secret Sharing (VSS) is a field of Visual Cryptography (VC) in which the secret image (SI) is distributed to a certain number of participants in the form of different encrypted shares. The decryption then uses authorized shares in a pre-defined manner to obtain that secret information. Medical image secret sharing (MISS) is an emerging VSS field to address the performance challenges in sharing medical images, such as efficiency and effectiveness. Here, we propose a novel MISS for the histopathological medical images to achieve high performance in these two parameters. The novelty here is the Graphics Processing Unit (GPU) to exploit the data-parallelism in MISS during encryption and super-resolution (SR), supplementing effectiveness with efficiency. A Convolution Neural Network (CNN) for SR produces a high-contrast reconstructed image. We evaluate the presented model using standard objective assessment parameters and the Computer-Aided Diagnosis (CAD) systems. The result analysis confirmed the high-performance of the proposed MISS with a 98% SSIM of the deciphered image. Compared with the state-of-art deep learning models designed for the histopathological medical images, MISS outperformed with 99.71% accuracy. Also, we achieved a categorization precision that fits the CAD systems. We attained an overall speedup of 800 × over the sequential model. This speedup is significant compared to the speedups of the benchmark GPGPU-based medical image reconstruction models. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results