Browsing by Author "Talawar, B."

Now showing 1 - 20 of 77

A Comparative Study onÂ End-to-End Learning forÂ Self-Driving Cars
(Springer Science and Business Media Deutschland GmbH, 2024) Kumar, S.; Pir, M.A.; Rajan, J.; Talawar, B.
Autonomous vehicle technology has advanced in recent years. The self-driving car is one of the most attractive research fields, and automakers are fast focusing on it. There have been a number of attempts made in this field, such as lane recognition, the detection of objects on roadways, and the reconstruction of three-dimensional models; however, the focus of our study is on models that directly transform the camera input images into steering angles. In this paper, we performed a comparative study of some of the popular end-to-end CNN models pertaining to autonomous vehicles. We used four different data sets for model training and validation. Only one of the data sets was gathered from the real world; the other three were created using software simulations. For evaluating the performance of different models, we used the mean squared error (MSE) metric. It was interesting to see that certain models fared better than others when applied to diverse data sets. When considering real-world datasets, both pre-trained VGG-16 and pre-trained VGG-19 using transfer learning exhibit comparable performance, achieving an MSE value of 21.4 which is better than all other considered models. However, in the case of simulated datasets, pre-trained VGG-19 outperforms the majority of the other models. Â© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
A Crossbar Interconnection Network in DNA
(Institute of Electrical and Electronics Engineers Inc., 2015) Talawar, B.
DNA computers provide exciting challenges and opportunities in the fields of computer architecture, neural networks, autonomous micromechanical devices, and chemical reaction networks. The advent of digital abstractions such as the seesaw gates hold many opportunities for computer architects to realize complex digital circuits using DNA strand displacement principles. The paper presents a realization of a single bit, 2Ã—2 crossbar interconnection network built using seesaw gates. The functional correctness of the implemented crossbar was verified using a chemical reaction simulator. Â© 2015 IEEE.
A Detailed Study of SOT-MRAM as an Alternative to DRAM Primary Memory in Multi-Core Environment
(Institute of Electrical and Electronics Engineers Inc., 2024) Kallinatha, H.D.; Rai, S.; Talawar, B.
As the current primary memory technology is reaching its limits, it is essential to explore alternative memory technologies to accommodate modern applications and use cases. However, using new memory technology poses the challenge of deriving accurately estimated parameters for integrating new memory technology and doing reliable simulations. This study proposes a new approach incorporating Spin-Orbit-Torque-Magnetic-RAM (SOT-MRAM) into hybrid and full main memory architectures within a multi-core system, encompassing various memory configurations and capacities. The study addresses the challenge of evaluating SOT-MRAM-based memory systems when specific SOT-MRAM memory parameters are not publicly available. The research methodology includes micro-architectural (circuit-level) design space exploration and comprehensive full system simulations, which evaluate benchmark programs representing diverse application domains. The evaluation includes three memory structures with varying memory organizations and capacities. The results show that SOT-MRAM is a robust replacement for DRAM or hybrid memory, offering compelling advantages such as a remarkable 74.05% reduction in power consumption, a noteworthy 40.10% increase in bandwidth utilization, and a significant 72.85% reduction in Energy-Delay Product (EDP). The maximum latency penalties are also minimal, with a 3.71% increase for hybrid structures and a mere 0.07% for standalone SOT-MRAM memory structures. © 2013 IEEE.
A Framework for SOT-MRAM Scaling Road-Map with Density and Application Evaluation
(Institute of Electrical and Electronics Engineers Inc., 2024) Kallinatha, D.H.; Talawar, B.
The increasing difference between CPU speeds and memory access times, known as the 'Memory Wall' problem, poses considerable challenges in modern computing. This study introduces a scaling factor framework to integrate Spin-Orbit-Torque Magnetic RAM(SOT-MRAM) into cache architectures as a potential replacement for Static Random Access Memory(SRAM). This research primarily targets applications in artificial intelligence (AI), natural language processing(NLP), and broad computing tasks. It presents a method to evaluate the effectiveness of scaling factor framework and density enhancement in cache memory through the proposed frame-work's extensive Design Space Exploration(DSE). This exploration includes a detailed comparative analysis of SRAM and SOT-MRAM under various scaling conditions within the L2 and Last-Level Cache(LLC) segments. The outcomes indicate that SOT-MRAM significantly improves energy efficiency and reduces latency, achieving a 60% decrease in power usage and a 75% enhancement in response times compared to conventional SRAM caches. These advancements suggest that SOT-MRAM could effectively mitigate the challenges the Memory Wall poses, enhancing overall computational performance. Â© 2024 IEEE.
A Support Vector Regression-Based Approach to Predict the Performance of 2D 3D On-Chip Communication Architectures
(Institute of Electrical and Electronics Engineers Inc., 2019) Nirmal Kumar, A.; Talawar, B.
Recently, Networks-on-Chips (NoCs) have evolved as a scalable solution to traditional bus and point-to-point architecture. NoC design performance evaluation is largely based on simulation, which is extremely slow as the architecture size increases, and it gives little insight on how distinct design parameters impact the actual performance of the network. Simulation for optimization purposes is therefore very difficult to use. In this paper, we propose a Support Vector Regression(SVR)-based framework, which can be used to analyze the performance of 2D and 3D NoC architectures. Experiments were conducted by varying architecture sizes with different virtual channels, injection rates. The framework proposed can be used to obtain fast and accurate NoC performance estimates with a prediction error 2% to 4% and minimum speedup of 3000 Ã— to 3500Ã—. Â© 2019 IEEE.
Accurate Performance Analysis of 3D Mesh Network on Chip Architectures
(2018) Halavar, B.; Talawar, B.
With the increase in number and complexity of cores and components in CMPs and SoCs, a highly structured and efficient on-chip communication network is required to achieve high-performance and scalability. Network on Chips(NoC) emerged as the reliable communication framework in CMPs and SoCs. Many 2-D NoC architectures have been proposed for efficient on-chip communication. In this paper, we explore the design space of 3D NoCs using floorplan driven wire lengths and link delay estimation. We analyse the performance and cost of 2D and two 3D variants of the Mesh topology by injecting two synthetic traffic pattern for varying buffer space and floorplan based delays were considered to for the experiments. Results of our experiments show that for the injection rates from 0.02 to 0.2 the average network latency of a 4layer 3D Mesh is reduced up to 54% compared to its 2D counterpart. The on chip communication performance improved up to 2.2� and 3.1� in 4-layer 3D Mesh compare to 2D Mesh with uniform and transpose traffic patterns respectively. � 2018 IEEE.
Accurate Performance Analysis of 3D Mesh Network on Chip Architectures
(Institute of Electrical and Electronics Engineers Inc., 2018) Halavar, B.; Talawar, B.
With the increase in number and complexity of cores and components in CMPs and SoCs, a highly structured and efficient on-chip communication network is required to achieve high-performance and scalability. Network on Chips(NoC) emerged as the reliable communication framework in CMPs and SoCs. Many 2-D NoC architectures have been proposed for efficient on-chip communication. In this paper, we explore the design space of 3D NoCs using floorplan driven wire lengths and link delay estimation. We analyse the performance and cost of 2D and two 3D variants of the Mesh topology by injecting two synthetic traffic pattern for varying buffer space and floorplan based delays were considered to for the experiments. Results of our experiments show that for the injection rates from 0.02 to 0.2 the average network latency of a 4layer 3D Mesh is reduced up to 54% compared to its 2D counterpart. The on chip communication performance improved up to 2.2Ã— and 3.1Ã— in 4-layer 3D Mesh compare to 2D Mesh with uniform and transpose traffic patterns respectively. Â© 2018 IEEE.
Accurate Power and Latency Analysis of a Through-Silicon Via(TSV)
(2018) Pasupulety, U.; Halavar, B.; Talawar, B.
A Through Silicon Via(TSV) interconnects vertically stacked layers of circuit elements in a 3D IC. This leads to reduced distance and increased communication bandwidth between any two circuit elements located on different layers of the chip compared to 2D NoCs. TSVs have different physical characteristics and associated latency and power consumption compared to horizontal chip interconnects. The need of the hour is to accurately estimate the power consumption and latency of TSVs separately from horizontal interconnects through simulation. Accurate power and latency models of TSVs enable architects and researchers to arrive at the optimal design space by performing quick trade-off studies. We propose an extension to the BookSim simulator that considers TSVs as a separate type of on-chip interconnect. The associated latency and dynamic power consumption is calculated based on delay and power models involving various physical parameters of the TSV. Upon applying these models in a 3D 4times 4times 4 mesh topology simulation, it is observed that the total average link power consumed is lower than a 2D mesh by 13% when the vertical links(containing TSVs) are treated separately from the horizontal links. Additionally, the average network latency in the 3D mesh topology is roughly 60-82% lower than the 2D case. � 2018 IEEE.
Accurate Power and Latency Analysis of a Through-Silicon Via(TSV)
(Institute of Electrical and Electronics Engineers Inc., 2018) Pasupulety, U.; Halavar, B.; Talawar, B.
A Through Silicon Via(TSV) interconnects vertically stacked layers of circuit elements in a 3D IC. This leads to reduced distance and increased communication bandwidth between any two circuit elements located on different layers of the chip compared to 2D NoCs. TSVs have different physical characteristics and associated latency and power consumption compared to horizontal chip interconnects. The need of the hour is to accurately estimate the power consumption and latency of TSVs separately from horizontal interconnects through simulation. Accurate power and latency models of TSVs enable architects and researchers to arrive at the optimal design space by performing quick trade-off studies. We propose an extension to the BookSim simulator that considers TSVs as a separate type of on-chip interconnect. The associated latency and dynamic power consumption is calculated based on delay and power models involving various physical parameters of the TSV. Upon applying these models in a 3D 4times 4times 4 mesh topology simulation, it is observed that the total average link power consumed is lower than a 2D mesh by 13% when the vertical links(containing TSVs) are treated separately from the horizontal links. Additionally, the average network latency in the 3D mesh topology is roughly 60-82% lower than the 2D case. Â© 2018 IEEE.
Accurate Router Level Estimation of Network-on-Chip Architectures using Learning Algorithms
(2019) Kumar, A.; Talawar, B.
The problem of intra-communication between the Intellectual Properties(IPs) due to the rise in the amount of cores on single chips in System-on-Chip(SoC). Network-on-Chips(NoCs) has emerged as a reliable on-chip communication framework for Chip Multiprocessors and SoCs. Estimating NoC power and performance in the early stages has become crucial. We employ Machine Learning(ML) approaches to estimate architecture-level on-chip router models and performance. Experiments were carried out with distinct topology sizes with various virtual channels, injection rates, and traffic patterns. Booksim and Orion simulators are used to validate the results. Approximately 6% to 8% prediction error and a minimum speedup of 1500 � to 2000 � were shown in the framework. � 2019 IEEE.
Accurate Router Level Estimation of Network-on-Chip Architectures using Learning Algorithms
(Institute of Electrical and Electronics Engineers Inc., 2019) Kumar, A.; Talawar, B.
The problem of intra-communication between the Intellectual Properties(IPs) due to the rise in the amount of cores on single chips in System-on-Chip(SoC). Network-on-Chips(NoCs) has emerged as a reliable on-chip communication framework for Chip Multiprocessors and SoCs. Estimating NoC power and performance in the early stages has become crucial. We employ Machine Learning(ML) approaches to estimate architecture-level on-chip router models and performance. Experiments were carried out with distinct topology sizes with various virtual channels, injection rates, and traffic patterns. Booksim and Orion simulators are used to validate the results. Approximately 6% to 8% prediction error and a minimum speedup of 1500 Ã— to 2000 Ã— were shown in the framework. Â© 2019 IEEE.
An Efficient FPGA-Based Network-on-Chip Simulation Framework Utilizing the Hard Blocks
(Birkhauser, 2020) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
In multi-processor system-on-chips, on-chip interconnection plays a significant role. The type of on-chip architecture being used in an application decides the performance of that application. Hence, a quick and versatile network-on-Chip (NoC) simulator, particularly for the larger designs, is essential to explore and find the best suitable NoC configuration for individual applications. An FPGA-based NoC simulation framework has been proposed in this work. The crossbar switch of the NoC router with buffers and five ports has been embedded in the wide multiplexers of the DSP48E1 slices. The distinctive feature of dynamic mode functionality of the DSP48E1 slices every clock cycle depending on the control signals of multiplexer plays a crucial role in incorporating the crossbar functionality. A substantial decrease in the configurable logic blocks (CLBs) utilization of NoC topologies on the FPGA has been observed by embedding the functionality of the crossbar on the DSP48E1 slices. Since there is a reduction in the use of CLB resources employing the crossbar based on DSP48E1, topologies of larger sizes can be simulated. 6 × 6 Mesh topology with the DSP crossbar implementation consumes 36% fewer lookup tables (LUTs) and 40% fewer flip flops than the Mesh topology with CLB-based crossbar implementation. 41% fewer LUTs and 23% fewer slices are consumed by the proposed work with respect to the state-of-the-art CONNECT NoC generation tool. Compared to DART, a reduction of 86% and 80% in LUTs and slices has been observed with respect to the proposed work. Hoplite-DSP implements the unidirectional Torus topology with no buffers considering the deflective routing algorithm. The proposed work targets Mesh-based topologies with buffers and bidirectional ports with XY and look-ahead routing algorithms. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
Analysis of cache behaviour and software optimizations for faster on-chip network simulations
(Springer, 2019) Prasad, B.M.P.; Parane, K.; Talawar, B.
Fast simulations are critical in reducing time to market in chip multiprocessors and system-on-chips. Several simulators have been used to evaluate the performance and power consumed by network-on-chips (NoCs). To speedup the simulations, it is necessary to investigate and optimize the hotspots in the simulator source code. Among several simulators available, Booksim2.0 has been chosen for the experimentation as it is being extensively used in the NoC community. In this paper, the cache and memory system behavior of Booksim2.0 have been analyzed to accurately monitor input dependent performance bottlenecks. The measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having the least misses has been identified. To further reduce the cache misses, software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types have been employed. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. Thread parallelization and vectorization have been employed to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93× and 3.97× were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively. © 2019, The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden.
Analysis of power-performance trade-offs in DRAM-NVM based hybrid main memory
(American Institute of Physics Inc., 2023) Rai, S.; Talawar, B.
Conventional DRAM based memory systems are facing issues such as scalability and high static energy consumption. Non-Volatile Memory devices are byte addressable, can withstand data even in the absence of power and provide good density; all these features have attracted them to be used in various levels of memory hierarchy. However there are few drawbacks because of which they cannot replace the existing memory devices. Hybrid Memory architectures are an amalgamation of two or more types of memory technologies which help to utilize benefits from both technologies. In this work, we have tried to analyze the impact of varying NVM size on power and performance in Hybrid DRAM-NVM based devices. The experimental results shows average power reduction of up to 46.2%, 63.7% when PCM was increased from 1GB to 2 GB and 3 GB, respectively. Â© 2023 Author(s).
Analysis of ring topology for NoC architecture
(2016) Kamath, A.; Saxena, G.; Talawar, B.
In recent years, Network on Chips (NoCs) have provided an efficient solution for interconnecting various heterogeneous intellectual properties (IPs) on a System on Chip (SoC) in an efficient, flexible and scalable manner. Virtual channels in the buffers associated with the core helps in introducing the parallelism between the packets as well as in improving the performance of the network. However, allocating a uniform size of the buffer to these channels is not always suitable. The network efficiency can be improved by allocating the buffer variably based on the traffic patterns and the node requirements. In this paper, we use ring topology as an underlying architecture for the NoC. The percentage of packet drops has been used as a parameter for comparing the performance of different architectures. Through the results of the simulations carried out in SystemC, we illustrate the impact of including virtual channels and variable buffers on the network performance. As per our results, we observed that varied buffer allocation led to a better performance and fairness in the network as compared to that of the uniform allocation. � 2015 IEEE.
Analysis of ring topology for NoC architecture
(Institute of Electrical and Electronics Engineers Inc., 2016) Kamath, A.; Saxena, G.; Talawar, B.
In recent years, Network on Chips (NoCs) have provided an efficient solution for interconnecting various heterogeneous intellectual properties (IPs) on a System on Chip (SoC) in an efficient, flexible and scalable manner. Virtual channels in the buffers associated with the core helps in introducing the parallelism between the packets as well as in improving the performance of the network. However, allocating a uniform size of the buffer to these channels is not always suitable. The network efficiency can be improved by allocating the buffer variably based on the traffic patterns and the node requirements. In this paper, we use ring topology as an underlying architecture for the NoC. The percentage of packet drops has been used as a parameter for comparing the performance of different architectures. Through the results of the simulations carried out in SystemC, we illustrate the impact of including virtual channels and variable buffers on the network performance. As per our results, we observed that varied buffer allocation led to a better performance and fairness in the network as compared to that of the uniform allocation. Â© 2015 IEEE.
Cache analysis and software optimizations for faster on-chip network simulations
(2016) Parane, K.; Prabhu, Prasad, B.M.; Talawar, B.
Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speedup the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93� and 3.97� were observed for the Mesh topology with 30 � 30 network size by employing thread parallelization and vectorization respectively. � 2016 IEEE.
Cache analysis and software optimizations for faster on-chip network simulations
(Institute of Electrical and Electronics Engineers Inc., 2016) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speedup the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93Ã— and 3.97Ã— were observed for the Mesh topology with 30 Ã— 30 network size by employing thread parallelization and vectorization respectively. Â© 2016 IEEE.
Challenges in Design, Data Placement, Migration and Power-Performance Trade-offs in DRAM-NVM-based Hybrid Memory Systems
(Taylor and Francis Ltd., 2023) Rai, S.; Talawar, B.
DRAM-NVM-based hybrid memory opens up a varied range of power-performance-area operational configurations through page migration between the high-performance DRAM and the reliable NVM. The amalgamation of two technologies requires various modifications for the existing monolithic DRAM-based systems. This paper summarizes the current research work in the areas of data placement and page migration in hybrid memories. The challenges and design solutions from a range of NVMs-PCM, STT-RAM, ReRAM is presented. This paper also identifies several research challenges in these areas. © 2023 IETE.
Comparative analysis of non-volatile memory on-chip caches
(American Institute of Physics Inc., 2023) Kallinatha, H.D.; Talawar, B.
The SRAM on-chip caches occupy a significant chip area and consume substantial power in modern processors. This paper aims to study the emerging Non-Volatile Memory (NVM) systems suitability for memory hierarchy beyond CMOS. NVMs have ultra-low leakage, better scalability and consume less energy. The NVM technologies such as STT-MRAM, ReRAM and PCM suffer from write endurance and read disturbance problems. The new spintronics technology, such as spin orbit-torque (SOT) switching based magneto resistive (MRAM) memory, can overcome the issues in STT-MRAM. So, this paper aims to study the impact of small to sizable on-chip memory in full exploration mode of the simulator for estimation of the energy, area, leakage power and per access latency of memory technologies. We present a detailed comparative analysis of NVMs and SRAM at 45nm. The study concludes that the SOT-MRAM area is smaller for cache size above 64KB and faster than 32KB. In addition, this consumes less energy above 128KB and reduces leakage power above 16KB compared to SRAM. Significant benefits of SOT-MRAM are that it provides area efficiency of 57.29%, speedup of 3.27 times faster and 94.53% less leakage power than SRAM. Â© 2023 Author(s).