Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 10 of 16

Performance Evaluation inÂ 2D NoCs Using ANN
(Springer Science and Business Media Deutschland GmbH, 2022) Kale, P.; Hazarika, P.; Jain, S.; Bhowmik, B.
A network-on-chip (NoC) performance is traditionally evaluated using a cycle-accurate simulator. However, when the NoC size increases, the time required for providing the simulation results rises significantly. Therefore, such an issue must be overcome with an alternate approach. This paper proposes an artificial neural network (ANN)-based framework to predict the performance parameters for NoCs. The proposed framework is learned with the training dataset supplied by the BookSim simulator. Rigorous experiments are performed to measure multiple performance metrics at varying experimental setups. The results show that network latency is in the range of 31.74â€“80.70 cycles. Further, the switch power consumption is in the range of 0.05â€“12.41 Î¼ W. Above all, the proposed performance evaluation scheme achieves the speedup of 277â€“2304 Ã— with an accuracy of up to 93%. Â© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Analysis of cache behaviour and software optimizations for faster on-chip network simulations
(Springer, 2019) Prasad, B.M.P.; Parane, K.; Talawar, B.
Fast simulations are critical in reducing time to market in chip multiprocessors and system-on-chips. Several simulators have been used to evaluate the performance and power consumed by network-on-chips (NoCs). To speedup the simulations, it is necessary to investigate and optimize the hotspots in the simulator source code. Among several simulators available, Booksim2.0 has been chosen for the experimentation as it is being extensively used in the NoC community. In this paper, the cache and memory system behavior of Booksim2.0 have been analyzed to accurately monitor input dependent performance bottlenecks. The measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having the least misses has been identified. To further reduce the cache misses, software optimization techniques such as removal of unused functions, loop interchanging and replacing post-increment operator with pre-increment operator for non-primitive data types have been employed. The cache misses were reduced by 18.52%, 5.34% and 3.91% by employing above technology respectively. Thread parallelization and vectorization have been employed to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93× and 3.97× were observed for the Mesh topology with 30 × 30 network size by employing thread parallelization and vectorization respectively. © 2019, The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden.
YaNoC: Yet Another Network-on-Chip Simulation Acceleration Engine Supporting Congestion-Aware Adaptive Routing Using FPGAS
(World Scientific Publishing Co. Pte Ltd wspc@wspc.com.sg, 2019) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
Many-core systems employ the Network on Chip (NoC) as the underlying communication architecture. To achieve an optimized design for an application under consideration, there is a need for fast and flexible NoC simulator. This paper presents an FPGA-based NoC simulation acceleration framework supporting design space exploration of standard and custom NoC topologies considering a full set of microarchitectural parameters. The framework is capable of designing custom routing algorithms, various traffic patterns such as uniform random, transpose, bit complement and random permutation are supported. For conventional NoCs, the standard minimal routing algorithms are supported. For designing the custom topologies, the table-based routing has been implemented. A custom topology called diagonal mesh has been evaluated using table-based and novel shortest path routing algorithm. A congestion-aware adaptive routing has been proposed to route the packets along the minimally congested path. The congestion-aware adaptive routing algorithm has negligible FPGA area overhead compared to the conventional XY routing. Employing the congestion-aware adaptive routing, network latency is reduced by 55% compared to the XY routing algorithm. The microarchitectural parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on NoC behavior. For the 6×6 mesh topology, the LUT and FF usages will be increased from 32.23% to 34.45% and from 12.62% to 15% considering the buffer depth of 4 and flit widths of 16 bits, and 32 bits, respectively. Similar behavior has been observed for other configurations of buffer depth and flit width. The torus topology consumes 24% more resources than the mesh topology. The 56-node fat tree topology consumes 27% and 2.2% more FPGA resources than the 6×6 mesh and torus topologies. The 56-node fat tree topology with buffer depth of 8 and 16 flits saturates at the injection rates of 40% and 45%, respectively. © 2019 World Scientific Publishing Company.
Extending BookSim2.0 and HotSpot6.0 for power, performance and thermal evaluation of 3D NoC architectures
(Elsevier B.V., 2019) Halavar, B.; Pasupulety, U.; Talawar, B.
With the increase in number and complexity of cores and components in Chip-Multiprocessors (CMP) and Systems-on-Chip (SoCs), a highly structured and efficient on-chip communication network is required to achieve high-performance and scalability. Network-on-Chip (NoC) has emerged as a reliable communication framework in CMPs and SoCs. Many 2-D NoC architectures have been proposed for efficient on-chip communication. Cycle accurate simulators model the functionality and behaviour of NoCs by considering micro-architectural parameters of the underlying components to estimate performance, power and energy characteristics. Employing NoCs in three-dimensional integrated circuits (3D-ICs) can further improve performance, energy efficiency, and scalability characteristics of 3D SoCs and CMPs. Minimal error estimation of energy and performance of NoC components is crucial in architecture trade-off studies. Accurate modeling of re:Horizontal and vertical links by considering micro-architectural and physical characteristics reduces the error in power and performance estimation of 3D NoCs. Additionally, mapping the temperature distribution in a 3D NoC reduces estimation error. This paper presents the 3D NoC modelling capabilities extended in two existing state-of-the-art simulators, viz., the 2D NoC Simulator - BookSim2.0 and the thermal behaviour simulator - HotSpot6.0. With the extended 3D NoC modules, the simulators can be used for power, performance and thermal measurements through micro-architectural and physical parameters. The major extensions incorporated in BookSim2.0 are: Through Silicon Via power and performance models, 3D topology construction modules, 3D Mesh topology construction using variable X, Y, Z radix, tailored routing modules for 3D NoCs. The major extensions incorporated in HotSpot6.0 are: parameterized 2D router floorplan, 3D router floorplan including Through Silicon Vias (TSVs), power and thermal distribution models of 2D and 3D routers. Using the extended 3D modules, performance (average network latency), and energy efficiency metrics (Energy-Delay Product) of variants of 3D Mesh and 3D Butterfly Fat Tree topologies have been evaluated using synthetic traffic patterns. Results show that the 4-layer 3D Mesh is 2.2 × better than 2-layer 3D Mesh and 4.5 × better than 3D BFT variants in terms of network latency. 3D Mesh variants have the lowest Energy Delay Product (EDP) compared to 3D BFT variants as there is an 80% reduction in link lengths and up to 3 × more TSVs. Another observation is that the EDP of the 4-layer 3D BFT (with transpose traffic) is 1.5 × the EDP of the 4-layer 3D Mesh (with transpose traffic). Further optimizations towards a tailored 3D BFT for transpose traffic could reduce this EDP gap with the 4-layer 3D Mesh. From the 3D NoC heat maps, it was found that the edge routers in the floorplan of the tested 3D Mesh and 3D BFT topologies have the least ambient temperature. © 2019
LBNoc: Design of low-latency router architecture with lookahead bypass for network-on-chip using FPGA
(Association for Computing Machinery acmhelp@acm.org, 2020) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
An FPGA-based Network-on-Chip (NoC) using a low-latency router with a look-ahead bypass (LBNoC) is discussed in this article. The proposed design targets the optimized area with improved network performance. The techniques such as single-cycle router bypass, adaptive routing module, parallel Virtual Channel (VC), and Switch allocation, combined virtual cut through and wormhole switching, have been employed in the design of the LBNoC router. The LBNoC router is parameterizable with the network topology, traffic patterns, routing algorithms, buffer depth, buffer width, number of VCs, and I/O ports being configurable. A table-based routing algorithm has been employed to support the design of custom topologies. The input buffer modules of NoC router have been mapped on the FPGA Block RAM hard blocks to utilize resources efficiently. The LBNoC architecture consumes 4.5% and 27.1% fewer hardware resources than the ProNoC and CONNECT NoC architectures. The average packet latency of the LBNoC NoC architecture is 30% and 15% lower than the CONNECT and ProNoC architectures. The LBNoC architecture is 1.15× and 1.18× faster than the ProNoC and CONNECT NoC frameworks. © 2020 Association for Computing Machinery.
ELBA-NoC: Ensemble learning-based accelerator for 2D and 3D network-on-chip architectures
(Inderscience Publishers, 2020) Kumar, A.; Talawar, B.
Network-on-chips (NoCs) have emerged as a scalable alternative to traditional bus and point-to-point architectures, it has become highly sensitive as the number of cores increases. Simulation is one of the main tools used in NoC for analysing and testing new architectures. To achieve the best performance vs. cost trade-off, simulators have become an essential tool. Software simulators are too slow for evaluating large scale NoCs. This paper presents a framework which can be used to analyse overall performance of 2D and 3D NoC architectures which is fast and accurate. This framework is named as ensemble learning-based accelerator (ELBA-NoC) which is built using random forest regression algorithm to predict parameters of NoCs. On 2D, 3D NoC architectures, ELBA-NoC was tested and the results obtained were compared with extensively used Booksim NoC simulator. The framework showed an error rate of less than 5% and an overall speedup of up to 16 K×. © © 2020 Inderscience Enterprises Ltd.
Power and performance analysis of 3D network-on-chip architectures
(Elsevier Ltd, 2020) Halavar, B.; Talawar, B.
Emerging 3D integrated circuits(ICs) employ 3D network-on-chip(NoC) to improve power, performance, and scalability. The NoC Simulator uses the microarchitecture parameters to estimate the power and performance of the NoC. We explore the design space for 3D Mesh and Butterfly Fat Tree(BFT) NoC architecture using floorplan drive wire length and link delay estimation. The delay and power models are extended using Through Silicon Via (TSV) power and delay models. Serialization is employed to reduce the TSV area cost. Buffer space is equalised for a fair comparison between topologies. The Performance, Flits per Joules(FpJ) and Energy Delay Product(EDP) of six 2D and 3D variants of Mesh and BFT topologies (two and four layers) are analyzed by injecting synthetic traffic patterns. The 3D-4L Mesh exhibit better performance, energy efficiency (up to 4.5 × ), and EDP (up to 98 %) compared to other variants. This is because the overall length of the horizontal link is short and the number of TSVs is large (3 × ). © 2020 Elsevier Ltd
Maximal Connectivity Test with Channel-Open Faults in On-Chip Communication Networks
(Springer, 2020) Bhowmik, B.
The networks-on-chip (NoCs) as the prevalent interconnection infrastructure have been continuously replacing the contemporary chip microprocessors (CMPs) while high performance computing is the dominant consideration. Aggressive technology scaling progressively reduces the feature size of the chips resulting in increasing susceptibility to failures and breakdowns due to open faults on communication channels. The reliability and performance issues are then becoming more critical requirement in both current and future NoC-based CMPs. This paper first presents an on-line, distributed built-in-self-test (BIST) oriented test mechanism that particularly detects open faults on communication channels and identifies faulty wires from the channels in NoCs. Next, a suitable test scheduling scheme is presented in order to reduce the overall test time and related performance overhead due the fault. Such scheduling scheme makes the present test solution scalable with large scale NoC architectures in general. Implementation of the test mechanism takes little hardware area and few clocks to detect the fault in channels. The on-line evaluation of the proposed test solution demonstrates the effect of the channel-open faults on the NoC performance characteristics at large real like synthetic traffic. In comparison to wide range of prior works on 16-bit networks, the present scheme provides many advantages, e.g., it improves hardware area overhead by 35.36–67.73% and saves the test time by 96.43%. packet latency and energy consumption by 5.83–42.79% and 6.24–46.38%, respectively on the networks, the proposed scheme becomes competitive with the existing works. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
(Springer, 2020) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
The network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the overall performance of CMPs and MPSoCs significantly. We propose P-NoC: an FPGA-based parameterized framework for analyzing the performance of NoC architectures based on various design decision parameters in this paper. The mesh and a multi-local port mesh (ML-mesh) topologies have been considered for the study. By fine-tuning various NoC parameters and synthesizing on the FPGA, identify that the performance of NoC architectures are influenced by the configuration of router parameters and the interconnect. Experiments show that the flit width, buffer depth, virtual channels parameters have a significant impact on the FPGA resources. We analyze the performance of the NoCs on six traffic patterns viz., uniform, bit shuffle, random permutation, transpose, bit complement and nearest neighbor. Configuring the router and the interconnect parameters, the ML-mesh topology yields 75% lesser utilization of FPGA resources compared to the mesh. The ML-mesh topology shows an improvement of 33.2% in network latency under localized traffic pattern. The mesh and ML-mesh topologies have 0.53× and 0.1× higher saturation throughput under nearest neighbor traffic compared to uniform random traffic. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
An Efficient FPGA-Based Network-on-Chip Simulation Framework Utilizing the Hard Blocks
(Birkhauser, 2020) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
In multi-processor system-on-chips, on-chip interconnection plays a significant role. The type of on-chip architecture being used in an application decides the performance of that application. Hence, a quick and versatile network-on-Chip (NoC) simulator, particularly for the larger designs, is essential to explore and find the best suitable NoC configuration for individual applications. An FPGA-based NoC simulation framework has been proposed in this work. The crossbar switch of the NoC router with buffers and five ports has been embedded in the wide multiplexers of the DSP48E1 slices. The distinctive feature of dynamic mode functionality of the DSP48E1 slices every clock cycle depending on the control signals of multiplexer plays a crucial role in incorporating the crossbar functionality. A substantial decrease in the configurable logic blocks (CLBs) utilization of NoC topologies on the FPGA has been observed by embedding the functionality of the crossbar on the DSP48E1 slices. Since there is a reduction in the use of CLB resources employing the crossbar based on DSP48E1, topologies of larger sizes can be simulated. 6 × 6 Mesh topology with the DSP crossbar implementation consumes 36% fewer lookup tables (LUTs) and 40% fewer flip flops than the Mesh topology with CLB-based crossbar implementation. 41% fewer LUTs and 23% fewer slices are consumed by the proposed work with respect to the state-of-the-art CONNECT NoC generation tool. Compared to DART, a reduction of 86% and 80% in LUTs and slices has been observed with respect to the proposed work. Hoplite-DSP implements the unidirectional Torus topology with no buffers considering the deflective routing algorithm. The proposed work targets Mesh-based topologies with buffers and bidirectional ports with XY and look-ahead routing algorithms. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results