Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 6 of 6
  • Item
    Implementation of comprehensive address generator for digital signal processor
    (2013) Ramesh Kini, R.M.; Sumam David, S.
    The performance of signal-processing algorithms implemented in hardware depends on the efficiency of datapath, memory speed and address computation. Pattern of data access in signal-processing applications is complex and it is desirable to execute the innermost loop of a kernel in a single-clock cycle. This necessitates the generation of typically three addresses per clock: two addresses for data sample/coefficient and one for the storage of processed data. Most of the Reconfigurable Processors, designed for multimedia, focus on mapping the multimedia applications written in a high-level language directly on to the reconfigurable fabric, implying the use of same datapath resources for kernel processing and address generation. This results in inconsistent and non-optimal use of finite datapath resources. Presence of a set of dedicated, efficient Address Generator Units (AGUs) helps in better utilisation of the datapath elements by using them only for kernel operations; and will certainly enhance the performance. This article focuses on the design and application-specific integrated circuit implementation of address generators for complex addressing modes required by multimedia signal-processing kernels. A novel algorithm and hardware for AGU is developed for accessing data and coefficients in a bit-reversed order for fast Fourier transform kernel spanning over log 2 N stages, AGUs for zig-zag-ordered data access for entropy coding after Discrete Cosine Transform (DCT), convolution kernels with stored/streaming data, accessing data for motion estimation using the block-matching technique and other conventional addressing modes. When mapped to hardware, they scale linearly in gate complexity with increase in the size. © 2013 Copyright Taylor and Francis Group, LLC.
  • Item
    Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU
    (John Wiley and Sons Ltd, 2015) Bayyapu, B.; Guddeti, R.M.R.; Raghavendra, P.S.
    General purpose computation on graphics processing unit (GPU) is rapidly entering into various scientific and engineering fields. Many applications are being ported onto GPUs for better performance. Various optimizations, frameworks, and tools are being developed for effective programming of GPU. As part of communication and computation optimizations for GPUs, this paper proposes and implements an optimization method called as kernel coalesce that further enhances GPU performance and also optimizes CPU to GPU communication time. With kernel coalesce methods, proposed in this paper, the kernel launch overheads are reduced by coalescing the concurrent kernels and data transfers are reduced incase of intermediate data generated and used among kernels. Computation optimization on a device (GPU) is performed by optimizing the number of blocks and threads launched by tuning it to the architecture. Block level kernel coalesce method resulted in prominent performance improvement on a device without the support for concurrent kernels. Thread level kernel coalesce method is better than block level kernel coalesce method when the design of a grid structure (i.e., number of blocks and threads) is not optimal to the device architecture that leads to underutilization of the device resources. Both the methods perform similar when the number of threads per block is approximately the same in different kernels, and the total number of threads across blocks fills the streaming multiprocessor (SM) capacity of the device. Thread multi-clock cycle coalesce method can be chosen if the programmer wants to coalesce more than two concurrent kernels that together or individually exceed the thread capacity of the device. If the kernels have light weight thread computations, multi clock cycle kernel coalesce method gives better performance than thread and block level kernel coalesce methods. If the kernels to be coalesced are a combination of compute intensive and memory intensive kernels, warp interleaving gives higher device occupancy and improves the performance. Multi clock cycle kernel coalesce method for micro-benchmark1 considered in this paper resulted in 10-40% and 80-92% improvement compared with separate kernel launch, without and with shared input and intermediate data among the kernels, respectively, on a Fermi architecture device, that is, GTX 470. A nearest neighbor (NN) kernel from Rodinia benchmark is coalesced to itself using thread level kernel coalesce method and warp interleaving giving 131.9% and 152.3% improvement compared with separate kernel launch and 39.5% and 36.8% improvement compared with block level kernel coalesce method, respectively. © 2014 John Wiley & Sons, Ltd.
  • Item
    Time synchronization problem of wireless sensor network using maximum probability theory
    (Springer, 2018) Upadhyay, D.; Dubey, A.K.; Santhi Thilagam, P.S.
    Synchronizing time between the sensors of wireless sensor network has vital importance. It helps in maintaining a consistent and reliable frame of time across the network. Two clocks are stated to be synchronized when their frequency source runs with equal rate and their offsets are set identical. Basically, due to the manufacturing difference there is slight variation in their clock oscillator which affects the degree of frequency source and accuracy. Therefore this leads to the problem of synchronizing time between the sensor clocks. To attain time synchronization in a network typical contention-based message passing techniques are used. In this paper two-way message passing scheme is utilized. It proposes a statistical tool based on the maximum probability theory for selecting the reference clock offset for time synchronization protocols. It also proposes a subset selection algorithm to support the proposed statistical tool. The results obtained consist of the selection of most probable estimate for clock offset. The proposed algorithm utilizes the two-way message passing scheme for the exchange of timing messages within the network. The proposed algorithm is compared with the existing algorithms for estimation of clock offset. It was observed that the proposed works gives better results in terms of efficiency i.e. 99.8% efficient. © 2018, The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden.
  • Item
    Application of non-linear Gaussian regression-based adaptive clock synchronization technique for wireless sensor network in agriculture
    (Institute of Electrical and Electronics Engineers Inc., 2018) Upadhyay, D.; Dubey, A.K.; Santhi Thilagam, P.S.
    Efficient and low power utilizing clock synchronization is a challenging task for a wireless-sensor network (WSN). Therefore, it is crucial to design a light weight clock synchronization protocols for these networks. An adaptive clock offset prediction model for WSN is proposed in this paper that exchanges fewer synchronization messages to improve the accuracy and efficiency. Timing information required is collected by setting a small WSN set up to investigate the soil condition to control the irrigation in agriculture. The networks investigate soils moisture, temperature, humidity, and pressure content along with the sensors clock offset. First, the prediction model perceives the existing sensor clock offset to observe the clock characteristics and delay. Then, a Gaussian function is applied for adjusting the parameters weight of the observed value in the prediction model. The system results demonstrate that the proposed adaptive non-linear Gaussian regression synchronization model utilizes 20% less energy as consumed by time sync protocol for sensor-network and reference broadcast synchronization Protocol. It also reduces the synchronization error with respect to root-mean-square error (RMSE) by 24.85% as compared to linear prediction synchronization with RMSE 28.72% in terms of accuracy. © 2001-2012 IEEE.
  • Item
    A statistical tool for time synchronization problem in WSN
    (Bentham Science Publishers P.O. Box 294 Bussum 1400 AG, 2019) Upadhyay, D.; Dubey, A.K.; Santhi Thilagam, P.S.
    Background: In recent research, time synchronization has a great importance in the various application of wireless sensor network. Localization, tracking, message passing using contention-based schemes and communication are some of the fields where synchronization between sensor clocks is highly required. Therefore, several algorithms were designed to achieve a rational and reliable frame of time within the wireless sensor network. Patents related to time synchronization in WSN were also analyzed. Methods: This paper discusses the powerful statistical tool using maximum probability theory for synchronizing the time within the sensor's clock. In this paper, maximum probability theory is applied to estimate the best value of clock offset between two sensor clocks. The proposed algorithm is analyzed by exchanging timing messages between nodes using two-way message exchange schemes. Results: The proposed algorithm is also implemented along with a Time-Sync Protocol for Sensor Network. It reduces error deviation from 2.32 to 0.064 ms as compared with Time-Sync Protocol for Sensor Network without proposed works. Conclusion: It was observed that for a small network, proposed work gives better and efficient results with Time-Sync Protocol for Sensor Network. © 2019 Bentham Science Publishers.
  • Item
    A compact 4-to-8-bit nonbinary SAR ADC based on 2 bits per cycle DAC architecture
    (Springer, 2019) Bhat, K.G.; Laxminidhi, T.; Bhat, M.S.
    A compact programmable-resolution successive approximation register (SAR) analog to digital converter (ADC) for biosignal acquisition system is presented. The ADC features a programmable 4-to-8-bit DAC that makes the ADC programmable with 2 bits evaluated in each clock cycle. At low resolution with relaxed noise and linearity requirements, use of an increased clock speed improves energy efficiency. A single DAC architecture is used to generate references for 2 bits per cycle evaluation for all resolutions. Nonbinary switched capacitor circuits, least sensitive to parasitics, are proposed for the use in DAC for reference generation. The choice of architecture and circuit design are presented with mathematical analysis. The post-layout simulation of designed ADC in 90 nm CMOS process has 1.2 MS/s sampling rate at 8-bit mode with a power consumption of 185 ?W achieving an ENOB of 7.6. The active area of designed ADC is 0.06 mm2. The DAC resolution scaling and the use of variable sampling rate maximize efficiency at lower resolutions. Therefore, figure of merit (FOM) is degraded only by a factor of 4.7 for resolution scaling from 8 to 4 bits. This is a significant improvement over 16× degradation expected from 8-bit to 4-bit resolution scaling by truncating the bits. © 2019, Indian Academy of Sciences.