Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Cache analysis and software optimizations for faster on-chip network simulations
    (Institute of Electrical and Electronics Engineers Inc., 2016) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speedup the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93× and 3.97× were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively. © 2016 IEEE.
  • Item
    FPGA based NoC Simulation Acceleration Framework Supporting Adaptive Routing
    (Institute of Electrical and Electronics Engineers Inc., 2018) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    In this paper, we present fast and param-eterized FPGA based Network-on-Chip (NoC) simu-lation acceleration framework with automated HDL generation engine. The framework supports the NoC architecture design parameters such as topology, rout-ing algorithms, link width, buffer size, flow control and traffic patterns. The parameterized, high perfor-mance and lightweight nature of proposed NoC based framework makes the ideal choice for NoC research studies. The Mesh based topologies have been con-sidered for the experimentation purpose. A congestion aware adaptive routing has been proposed along with the conventional XY routing. Also, parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on the NoC behavior. The adaptive routing algorithm for Mesh based topologies has negligible FPGA area overhead compared to the conventional XY routing. Employing the adaptive routing algorithm, the average packet latency is reduced by 55 % under transpose traffic pattern when compared to the XY routing algorithm. The speedup of 2548x has been observed compared to Booksim software simulator. The proposed framework is 2.54x and 25x times faster compared to CONNECT and DART FPGA based simulators respectively. © 2018 IEEE.
  • Item
    High-performance NoC simulation acceleration framework employing the xilinx DSP48E1 blocks
    (Institute of Electrical and Electronics Engineers Inc., 2019) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
    An FPGA based Network on Chip (NoC) simulation acceleration framework is presented in this paper. The functionality of the crossbar switch of the NoC router is embedded in the hard multiplexers of the Xilinx DSP48E1 slices. A significant reduction in the soft logic (LUT+FF) utilization of the FPGA implementation of the 6 × 6 Torus topology has been observed by employing the hard multiplexers of the DSP48E1 slices in the proposed work. DSP based crossbar implementation of the 6 × 6 Torus topology consumes 38% fewer LUTs and 45% fewer FFs than the LUT based crossbar implementation. 35% less power consumption has been observed in the DSP based implementation. The proposed work utilizes 76% fewer LUTs compared to the state-of-the-art CONNECT NoC generation tool. Buffered, bi-directional Torus topology with XY routing has been considered in the proposed DSP based implementation compared to the Hoplite-DSP which implements the bufferless, unidirectional Torus topology with deflective routing algorithm. The proposed framework achieves the speed up of 2.02× and 2.9× with respect to the LUT only and the CONNECT NoCs. © 2019 IEEE.
  • Item
    Design of an adaptive and reliable network on chip router architecture using FPGA
    (Institute of Electrical and Electronics Engineers Inc., 2019) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    We propose an adaptive, low cost, reliable and high performance router implemented based on a conventional two stage pipeline. The proposed Adaptive routing operates in adaptive mode as soon as the congestion is detected in network. We employ fault tolerant strategies for different components of routers such as input buffer, route compute unit, virtual channel allocation, switch allocation, and crossbar unit. The proposed router architecture differs from existing reliable routers, our implementation maintains the performance of fault tolerance router under massive network workloads by influencing the features of a crossbar, routing algorithm and router pipeline optimization. Our designed router is highly reliable than current fault receptive routers such as Wang[1], Vicis[2], BulletProof[3], RoCo[4] and Poluri[5]. The average latency is reduced by 0.69% and increased by 2.0% compared to fault tolerant and conventional router. © 2019 IEEE.
  • Item
    Hy-BTree: An efficient Tree based topology for FPGA based NoC implementation
    (Institute of Electrical and Electronics Engineers Inc., 2021) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
    Due to their hierarchical structure, Binary Tree (BTree) topology can be employed in Network-on-Chip (NoC) applications. Because of its lower bisection bandwidth, the performance degradation is observed in communication intensive applications. The Fat tree topology has been proposed to overcome the disadvantages of the BTree topology. But, the complexity of the Fat Tree topology's router becomes more complicated as we move towards the root node of the tree and occupying a huge amount of hardware resources compared to the BTree variant. Instead of going for Fat Tree topology, the number of hops taken by a packet in the BTree topology can be reduced by introducing new links in the network with an increase in the bisection bandwidth. In this work, we propose a variant of BTree topology called Hy-BTree by introducing additional links at the intermediate levels of the network to reduce the number of hops taken for the communication. The proposed design is implemented on the FPGA and compared with the other topologies from state-of-the-art the FPGA based NoC architectures. A reduction in average latency and an improvement in throughput have been observed in Hy-BTree with respect to the BTree network with negligible overhead. © 2021 IEEE.
  • Item
    YaNoC: Yet Another Network-on-Chip Simulation Acceleration Engine Supporting Congestion-Aware Adaptive Routing Using FPGAS
    (World Scientific Publishing Co. Pte Ltd wspc@wspc.com.sg, 2019) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    Many-core systems employ the Network on Chip (NoC) as the underlying communication architecture. To achieve an optimized design for an application under consideration, there is a need for fast and flexible NoC simulator. This paper presents an FPGA-based NoC simulation acceleration framework supporting design space exploration of standard and custom NoC topologies considering a full set of microarchitectural parameters. The framework is capable of designing custom routing algorithms, various traffic patterns such as uniform random, transpose, bit complement and random permutation are supported. For conventional NoCs, the standard minimal routing algorithms are supported. For designing the custom topologies, the table-based routing has been implemented. A custom topology called diagonal mesh has been evaluated using table-based and novel shortest path routing algorithm. A congestion-aware adaptive routing has been proposed to route the packets along the minimally congested path. The congestion-aware adaptive routing algorithm has negligible FPGA area overhead compared to the conventional XY routing. Employing the congestion-aware adaptive routing, network latency is reduced by 55% compared to the XY routing algorithm. The microarchitectural parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on NoC behavior. For the 6×6 mesh topology, the LUT and FF usages will be increased from 32.23% to 34.45% and from 12.62% to 15% considering the buffer depth of 4 and flit widths of 16 bits, and 32 bits, respectively. Similar behavior has been observed for other configurations of buffer depth and flit width. The torus topology consumes 24% more resources than the mesh topology. The 56-node fat tree topology consumes 27% and 2.2% more FPGA resources than the 6×6 mesh and torus topologies. The 56-node fat tree topology with buffer depth of 8 and 16 flits saturates at the injection rates of 40% and 45%, respectively. © 2019 World Scientific Publishing Company.
  • Item
    LBNoc: Design of low-latency router architecture with lookahead bypass for network-on-chip using FPGA
    (Association for Computing Machinery acmhelp@acm.org, 2020) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    An FPGA-based Network-on-Chip (NoC) using a low-latency router with a look-ahead bypass (LBNoC) is discussed in this article. The proposed design targets the optimized area with improved network performance. The techniques such as single-cycle router bypass, adaptive routing module, parallel Virtual Channel (VC), and Switch allocation, combined virtual cut through and wormhole switching, have been employed in the design of the LBNoC router. The LBNoC router is parameterizable with the network topology, traffic patterns, routing algorithms, buffer depth, buffer width, number of VCs, and I/O ports being configurable. A table-based routing algorithm has been employed to support the design of custom topologies. The input buffer modules of NoC router have been mapped on the FPGA Block RAM hard blocks to utilize resources efficiently. The LBNoC architecture consumes 4.5% and 27.1% fewer hardware resources than the ProNoC and CONNECT NoC architectures. The average packet latency of the LBNoC NoC architecture is 30% and 15% lower than the CONNECT and ProNoC architectures. The LBNoC architecture is 1.15× and 1.18× faster than the ProNoC and CONNECT NoC frameworks. © 2020 Association for Computing Machinery.
  • Item
    P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
    (Springer, 2020) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
    The network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the overall performance of CMPs and MPSoCs significantly. We propose P-NoC: an FPGA-based parameterized framework for analyzing the performance of NoC architectures based on various design decision parameters in this paper. The mesh and a multi-local port mesh (ML-mesh) topologies have been considered for the study. By fine-tuning various NoC parameters and synthesizing on the FPGA, identify that the performance of NoC architectures are influenced by the configuration of router parameters and the interconnect. Experiments show that the flit width, buffer depth, virtual channels parameters have a significant impact on the FPGA resources. We analyze the performance of the NoCs on six traffic patterns viz., uniform, bit shuffle, random permutation, transpose, bit complement and nearest neighbor. Configuring the router and the interconnect parameters, the ML-mesh topology yields 75% lesser utilization of FPGA resources compared to the mesh. The ML-mesh topology shows an improvement of 33.2% in network latency under localized traffic pattern. The mesh and ML-mesh topologies have 0.53× and 0.1× higher saturation throughput under nearest neighbor traffic compared to uniform random traffic. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    An Efficient FPGA-Based Network-on-Chip Simulation Framework Utilizing the Hard Blocks
    (Birkhauser, 2020) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
    In multi-processor system-on-chips, on-chip interconnection plays a significant role. The type of on-chip architecture being used in an application decides the performance of that application. Hence, a quick and versatile network-on-Chip (NoC) simulator, particularly for the larger designs, is essential to explore and find the best suitable NoC configuration for individual applications. An FPGA-based NoC simulation framework has been proposed in this work. The crossbar switch of the NoC router with buffers and five ports has been embedded in the wide multiplexers of the DSP48E1 slices. The distinctive feature of dynamic mode functionality of the DSP48E1 slices every clock cycle depending on the control signals of multiplexer plays a crucial role in incorporating the crossbar functionality. A substantial decrease in the configurable logic blocks (CLBs) utilization of NoC topologies on the FPGA has been observed by embedding the functionality of the crossbar on the DSP48E1 slices. Since there is a reduction in the use of CLB resources employing the crossbar based on DSP48E1, topologies of larger sizes can be simulated. 6 × 6 Mesh topology with the DSP crossbar implementation consumes 36% fewer lookup tables (LUTs) and 40% fewer flip flops than the Mesh topology with CLB-based crossbar implementation. 41% fewer LUTs and 23% fewer slices are consumed by the proposed work with respect to the state-of-the-art CONNECT NoC generation tool. Compared to DART, a reduction of 86% and 80% in LUTs and slices has been observed with respect to the proposed work. Hoplite-DSP implements the unidirectional Torus topology with no buffers considering the deflective routing algorithm. The proposed work targets Mesh-based topologies with buffers and bidirectional ports with XY and look-ahead routing algorithms. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.