Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
10 results
Search Results
Item Address generation for DSP Kernels(2011) Ramesh Kini, M.; Sumam David, S.Performance of Signal Processing Algorithms implemented in hardware depend on efficiency of datapath, memory speed, and address computation. Pattern of data access in signal processing applications is complex and it is desirable to execute the innermost loop of a kernel every clock. This demands generation of typically three addresses per clock: two addresses for data sample/coefficient and one for storage of processed data. Presence of a set of dedicated, efficient Address Generator Units (AGU) helps in better utilization of the datapath elements by using them only for kernel operations; and will certainly enhance the performance. This paper focuses on design and implementation of Comprehensive Address Generator Unit (CAGU) for complex addressing modes required by DSP Kernels used in Multimedia Signal Processing. An 8 bit CAGU has been implemented using UMC 0.18 micron, 6 metal layers process, that occupies 21802 sq microns, consuming 2.95 mW and works with a clock period of 6 ns. © 2011 IEEE.Item Single depth image super-resolution via high-frequency subbands enhancement and bilateral filtering(Institute of Electrical and Electronics Engineers Inc., 2016) Balure, C.S.; Ramesh Kini, M.; Bhavsar, A.This paper addresses the problem of super-resolution (SR) from a single low-resolution (LR) depth image to a high-resolution (HR) depth image. A simple yet effective method has been proposed using Discrete Wavelet Transform (DWT), Stationary Wavelet Transform (SWT), and by utilizing the gradient information of the interpolated LR image. We propose an intermediate stage to enhance the high-frequency subbands to recover the HR image for both noiseless and noisy scenarios. The proposed method has been validated on Middlebury dataset for different upsampling factors (i.e. 2, 4, 8) and is shown to be superior when compared with some related DWT and SWT based SR methods. We also demonstrate encouraging performance of the approach on noisy depth images. © 2016 IEEE.Item Depth image super-resolution with local medians and bilateral filtering(Institute of Electrical and Electronics Engineers Inc., 2016) Balure, C.S.; Ramesh Kini, M.; Bhavsar, A.In this paper, we propose an approach for depth image super-resolution (SR). Given a noisy low resolution (LR) depth image and its corresponding registered high resolution (HR) colour image, our approach improves the resolution of the LR image while suppressing noise. We use the segmentation of HR colour images as a cue for depth image super-resolution. Our method begins with a highly over-segmented color image (using well-known segmentation approaches such as mean shift (MS) or simple linear iterative clustering (SLIC), and an interpolated LR depth image. We then use a combination of the local medians in the depth image (corresponding to the colour segments) and bicubic interpolation, followed by bilateral filtering to compute the SR depth image. We performed experiments for higher magnification factors 4, 8 using the Middlebury depth image dataset and evaluate the SR performance using the PSNR and SSIM metrics. The experimental results show that proposed method (including some variants), while being relatively simplistic, shows an average improvement of 1.2dB and 1.7dB on noiseless and noisy data respectively, over the popular method of guided image filtering (GIF) for upsampling factor 8. © 2016 IEEE.Item Estimation of attack time constant for dynamic range compressors in hearing AIDS(Institute of Electrical and Electronics Engineers Inc., 2016) Deepu, S.P.; Sumam David, S.; Ramesh Kini, M.Dynamic Range Compression (DRC) is a key component in all modern Hearing AIDS. Attack and Release time constants decide the speed with which the DRC should act to the incoming signal amplitude variation. So an accurate estimation of time constants gives a precise control over the DRC behavior. In this paper we examined various errors that occur in the output of the DRC while using conventional methods which affect attack and release time constants adversely. New methods are proposed for a better estimation of time constants in DRC. Since all the modifications are made in the estimation of attack time, there is no need to change the existing hardware for DRC. The proposed algorithm gives perfect output characteristics with zero error for test signals defined in ANSI S3.22 standards for hearing aid specifications. © 2016 IEEE.Item Depth image super-resolution: A review and wavelet perspective(Springer Verlag service@springer.de, 2017) Balure, C.S.; Ramesh Kini, M.We propose an algorithm which utilizes the Discrete Wavelet Transform (DWT) to super-resolve the low-resolution (LR) depth image to a high-resolution (HR) depth image. Commercially available depth cameras capture depth images at a very low-resolution as compared to that of the optical cameras. Having an highresolution depth camera is expensive because of the manufacturing cost of the depth sensor element. In many applications like robot navigation, human-machine interaction (HMI), surveillance, 3D viewing, etc. where depth images are used, the LR images from the depth cameras will restrict these applications, thus there is a need of a method to produce HR depth images from the available LR depth images. This paper addresses this issue using DWT method. This paper also contributes to the compilation of the existing methods for depth image super-resolution with their advantages and disadvantages, along with a proposed method to super-resolve depth image using DWT. Haar basis for DWT has been used as it has an intrinsic relationship with super-resolution (SR) for retaining the edges. The proposed method has been tested on Middlebury and Tsukuba dataset and compared with the conventional interpolation methods using peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) performance metrics. © Springer Science+Business Media Singapore 2017.Item Wavelet based Noise Reduction Techniques for Real Time Speech Enhancement(Institute of Electrical and Electronics Engineers Inc., 2018) Ravi, B.R.; Deepu, S.P.; Ramesh Kini, M.; Sumam David, S.Fixed noise suppression techniques are generally used for speech enhancement in different low power real time systems. In this paper, we propose a modified adaptive system for classification of speech signals and noise reduction based on multi-band techniques. It involves initial identification of incoming speech segments as clean speech, speech in noise or pure noise. For the noisy speech segments, background noise classification is carried out using different wavelet-based feature sets. Noise Reduction system consists of removal of adaptive stationary noise and non-stationary noise based on classified noise type. Simulation results show that the proposed system provides optimal noise reduction and better speech quality with reduced computational complexity in adverse noisy environments. © 2018 IEEE.Item Design and Implementation of Reconfigurable Neural Network Accelerator(Institute of Electrical and Electronics Engineers Inc., 2022) Shenoy, M.S.; Ramesh Kini, M.General-purpose CPUs are sluggish and inefficient when used for computationally intensive applications including in neural networks. It is preferable to develop specialized hardware that can do a large number of multiply-accumulate operations rapidly and efficiently to execute such applications. The Re-configurable Neural Network Accelerator (RNNA) architecture that has been designed is appropriate for a variety of neural network applications. The computational resource requirements vary depending on the application; hence, mapping the application to the available set of resources requires reconfigurability. The fundamental unit of the RNNA is composed of a variety of Multiply-Accumulate (MAC) units, registers, and Address Generation Units (AGU). When compared to the computation performed by a single MAC array, the RNNA with four MAC arrays reduces the time required by approximately 75%. On the Nexys4 DDR Artix-7 FPGA board, RNNA was tested and implemented with a clock frequency of up to 60MHz and power consumption of 0.243W. © 2022 IEEE.Item Posit Extended RISC-V Processor and Its Enhancement Using Data Type Casting(Springer Science and Business Media Deutschland GmbH, 2023) Kurian, A.; Ramesh Kini, M.The Posit extended RISC-V processor has gained attention as an alternative to its floating point counterpart. However, the Posit compliant RISC-V processor needs further enhancements to be accepted as a standard. In this paper, the shortcomings of existing Posit integration approaches are discussed and a novel approach is put forth wherein, Posit and the floating point arithmetic are incorporated within the core. We also present a comparative study of various Posit integration approaches in terms of resource utilization and timing requirements. Furthermore, to enhance RISC-V processor that supports the concurrent usage of integer, floating point and the Posit arithmetic, a data type casting unit is incorporated. Two data type casting approaches are suggested and compared in terms of speed and the area occupied. Based on the implementation results, inferences are derived on the apt choice of data type casting approach to be undertaken. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Implementation of Reconfigurable Deep Learning Accelerator (RDLA) on PolarFire SoC(IEEE Computer Society, 2023) Shenoy, M.S.; Ramesh Kini, M.In neural networks and other computationally demanding applications, general-purpose CPUs are slow and ineffective. To run such applications, it is better to create specialized hardware capable of doing several multiply-accumulate operations quickly and effectively. For a wide range of neural network applications, the Reconfigurable Deep Learning Accelerator (RDLA) architecture has been developed. The fundamental unit of the RDLA is composed of a variety of Multiply-Accumulate (MAC) units, registers, and Address Generation Units (AGU). On the PolarFire SoC, RDLA was tested and implemented with a clock frequency of up to 62.5MHz for data processing. This paper shows the results testing with different images for a custom MNIST model with 4 layers with accuracy of 97.49% with power consumption of 1.85W. © 2023 IEEE.Item Designing, Implementing, and Interfacing BFloat16 Arithmetic Processing Unit to RISC-V Pipelined Processor(Springer Science and Business Media Deutschland GmbH, 2025) Shiva Ganesh, K.; Ramesh Kini, M.In this paper, we present the implementation of Brain floating point format (BFloat16) floating point arithmetic unit in the execute pipeline of the RISC-V processor. For a certain class of applications like deep neural networks, this format significantly improves the power, performance, and area (PPA) metrics of the processor as compared to a single precision format. To validate our approach, we developed a dedicated BFloat16 floating point arithmetic unit and conducted a comprehensive comparison with the conventional single precision floating point unit (FPU32). The BFloat16 format demonstrates a well-balanced trade-off between computational advantages and storage benefits when compared to the IEEE-754 single precision format. The proposed unit extends the capabilities of the RISC-V processor to efficiently handle BFloat16 computations, incorporates architectural modifications and instruction set extensions; achieving enhanced performance. Implementation on an Artix-7 FPGA (XC7a35tcpg236-1) allowed us to assess resource utilization and timing delay. Additionally, the arithmetic units of both formats are implemented on ASIC using gf180 PDK (process design kit) using OpenROAD toolchain. The results show that the BFloat16 unit consumes fewer resources and also computes faster than the existing FPU32 operations. The proposed BFloat16-compliant RISC-V core can run at a maximum frequency of 47 MHz with a power consumption of 0.075W. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
