Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 11

Cache analysis and software optimizations for faster on-chip network simulations
(Institute of Electrical and Electronics Engineers Inc., 2016) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
Fast simulations are critical in reducing time to market in CMPs and SoCs. Several simulators have been used to evaluate the performance and power consumed by Network-on-Chips. Researchers and designers rely upon these simulators for design space exploration of NoC architectures. Our experiments show that simulating large NoC topologies take hours to several days for completion. To speedup the simulations, it is necessary to investigate and optimize the hotspots in simulator source code. Among several simulators available, we choose Booksim2.0, as it is being extensively used in the NoC community. In this paper, we analyze the cache and memory system behaviour of Booksim2.0 to accurately monitor input dependent performance bottlenecks. Our measurements show that cache and memory usage patterns vary widely based on the input parameters given to Booksim2.0. Based on these measurements, the cache configuration having least misses has been identified. We also employ thread parallelization and vectorization to improve the overall performance of Booksim2.0. The OpenMP programming model and SIMD are used for parallelizing and vectorizing the more time-consuming portions of Booksim2.0. Speedups of 2.93Ã— and 3.97Ã— were observed for the Mesh topology with 30 Ã— 30 network size by employing thread parallelization and vectorization respectively. Â© 2016 IEEE.
YaNoC: Yet another network-on-chip simulation acceleration engine using FPGAS
(IEEE Computer Society help@computer.org, 2018) Parane, K.; Talawar, B.; Prabhu Prasad, P.
In this paper, we present an FPGA based NoC simulation framework, YaNoC, that supports the creation of standard and custom topologies, design of routing algorithms, generation of various synthetic traffic patterns, and exploration of a full set of microarchitectural parameters. The framework supports all standard minimal routing algorithms for conventional NoCs and implements table based routing to support the creation of new routing algorithm. A custom topology called Diagonal Mesh (DMesh) has been evaluated using table based and a modified version of the XY routing algorithm. Mesh and DMesh topologies saturate at the injection rates of 45 % and 55 %. We find that the Table based routing implementation consumes 0.98Ã— fewer hardware resources than the conventional XY routing. We observed the speedup of 2548Ã— compared to the Booksim software simulator. YaNoC achieves speedup of 2.54Ã— and 25Ã— with respect to CONNECT and DART FPGA based NoC simulators. Â© 2018 IEEE.
FPGA based NoC Simulation Acceleration Framework Supporting Adaptive Routing
(Institute of Electrical and Electronics Engineers Inc., 2018) Parane, K.; Prabhu Prasad, B.M.; Talawar, B.
In this paper, we present fast and param-eterized FPGA based Network-on-Chip (NoC) simu-lation acceleration framework with automated HDL generation engine. The framework supports the NoC architecture design parameters such as topology, rout-ing algorithms, link width, buffer size, flow control and traffic patterns. The parameterized, high perfor-mance and lightweight nature of proposed NoC based framework makes the ideal choice for NoC research studies. The Mesh based topologies have been con-sidered for the experimentation purpose. A congestion aware adaptive routing has been proposed along with the conventional XY routing. Also, parameters such as buffer depth, traffic pattern and flit width have been varied to observe the effect on the NoC behavior. The adaptive routing algorithm for Mesh based topologies has negligible FPGA area overhead compared to the conventional XY routing. Employing the adaptive routing algorithm, the average packet latency is reduced by 55 % under transpose traffic pattern when compared to the XY routing algorithm. The speedup of 2548x has been observed compared to Booksim software simulator. The proposed framework is 2.54x and 25x times faster compared to CONNECT and DART FPGA based simulators respectively. Â© 2018 IEEE.
Machine Learning Based Framework to Predict Performance Evaluation of On-Chip Networks
(Institute of Electrical and Electronics Engineers Inc., 2018) Kumar, A.; Talawar, B.
Chip Multiprocessors(CMPs) and Multiprocessor System-on-Chips(MPSoCs) are meeting the ever increasing demand for high performance in processing large scale data and applications. There is a corresponding increase in the volume and frequency of traffic in the Network-on-Chip(NoC) architectures like CMPs and SoCs. NoC performance parameters like network latency, flit latency and hop count are critical measures which directly influence the overall performance of the architecture and execution time of the application. Unfortunately, cycle-accurate software simulators become slow for interactive use with an increase in architectural size of NoC. In order to provide the chip designer with an efficient framework for accurate measurements of NoC performance parameters, we propose a Machine Learning(ML) framework. Which is designed using different ML regression algorithms like Support Vector Regression(SVR) with different kernels and Artificial Neural Networks(ANN) with different activation functions. The proposed learning framework can be used to analyze the performance parameters of Mesh and Torus based NoC architectures. Results obtained are compared against the widely used cycle-accurate Booksim simulator. Experiments were conducted by variables like topology size from 2\times 2 to 30\times 30 with different virtual channels, traffic patterns and injection rates. The framework showed an approximate prediction error of 5% to 8% and overall minimum speedup of 1500\times to 2000\times. Â© 2018 IEEE.
High-performance NoC simulation acceleration framework employing the xilinx DSP48E1 blocks
(Institute of Electrical and Electronics Engineers Inc., 2019) Prabhu Prasad, B.M.; Parane, K.; Talawar, B.
An FPGA based Network on Chip (NoC) simulation acceleration framework is presented in this paper. The functionality of the crossbar switch of the NoC router is embedded in the hard multiplexers of the Xilinx DSP48E1 slices. A significant reduction in the soft logic (LUT+FF) utilization of the FPGA implementation of the 6 Ã— 6 Torus topology has been observed by employing the hard multiplexers of the DSP48E1 slices in the proposed work. DSP based crossbar implementation of the 6 Ã— 6 Torus topology consumes 38% fewer LUTs and 45% fewer FFs than the LUT based crossbar implementation. 35% less power consumption has been observed in the DSP based implementation. The proposed work utilizes 76% fewer LUTs compared to the state-of-the-art CONNECT NoC generation tool. Buffered, bi-directional Torus topology with XY routing has been considered in the proposed DSP based implementation compared to the Hoplite-DSP which implements the bufferless, unidirectional Torus topology with deflective routing algorithm. The proposed framework achieves the speed up of 2.02Ã— and 2.9Ã— with respect to the LUT only and the CONNECT NoCs. Â© 2019 IEEE.
Floorplan Based Performance Estimation of Network-on-Chips using Regression Techniques
(Institute of Electrical and Electronics Engineers Inc., 2019) Kumar, A.; Talawar, B.
An intra-communication problem between the Intellectual Properties(IPs) caused by the growth of a number of cores on single chips in System-on-Chip(SoC) gave rise to new a system architecture called Network-on-Chip(NoC). The early stages of designing NoC can be done using cycle-accurate NoC simulators, but they become slow as the architecture size of NoC increases. Hence a machine learning framework is being proposed by considering two scenarios i,e. A fixed delay between the components and floorplan based delay among the components of NoC. This framework is modeled using distinct Machine Learning(ML) regression algorithms to predict performance parameters of NoCs considering uniform random and transpose traffic patterns. Complete performance analysis of Mesh NoC architecture can be done by using the proposed ML framework. Booksim simulator results are used to verify effectiveness of proposed framework and it showed an overall speedup of 2000Ã— to 2500Ã—. Â© 2019 IEEE.
High-Performance NoCs Employing the DSP48E1 Blocks of the Xilinx FPGAs
(IEEE Computer Society help@computer.org, 2019) Prabhu, P.B.M.; Parane, K.; Talawar, B.
The hard multiplexers of the Xilinx DSP48E1 slices have been employed to support the functionality of crossbar switch of the buffered five port Network-on-Chip (NoC) routers. This is possible due to the dynamic mode operation of the DSP48E1 slices per clock cycle based on the multiplexer control signals. As a result of this, a significant reduction in the soft logic (LUT+FF) utilization of the FPGA implementation of the 6Ã— 6 Mesh topology has been observed. DSP based crossbar implementation of the 6Ã— 6 Mesh topology consumes 36% fewer LUTs and 40% fewer FFs than the LUT based crossbar implementation. 38% less power consumption has been observed in the DSP based implementation. The proposed work utilizes 41% fewer LUTs compared to the state-of-the-art CON-NECT NoC generation tool. The latency reductions of 31% and 38% have been achieved by the proposed DSP48E1 based crossbar implementation over the LUT crossbar implementation of 8Ã— 8 Mesh topology under the Uniform and Transpose traffic patterns. Also, the proposed DSP48E1 based implementation achieves the saturation throughput improvements of 1.4Ã— and 1.6Ã— over the LUT based implementation under Uniform and Transpose traffic patterns respectively. Â© 2019 IEEE.
Accurate Router Level Estimation of Network-on-Chip Architectures using Learning Algorithms
(Institute of Electrical and Electronics Engineers Inc., 2019) Kumar, A.; Talawar, B.
The problem of intra-communication between the Intellectual Properties(IPs) due to the rise in the amount of cores on single chips in System-on-Chip(SoC). Network-on-Chips(NoCs) has emerged as a reliable on-chip communication framework for Chip Multiprocessors and SoCs. Estimating NoC power and performance in the early stages has become crucial. We employ Machine Learning(ML) approaches to estimate architecture-level on-chip router models and performance. Experiments were carried out with distinct topology sizes with various virtual channels, injection rates, and traffic patterns. Booksim and Orion simulators are used to validate the results. Approximately 6% to 8% prediction error and a minimum speedup of 1500 Ã— to 2000 Ã— were shown in the framework. Â© 2019 IEEE.
A Support Vector Regression-Based Approach to Predict the Performance of 2D 3D On-Chip Communication Architectures
(Institute of Electrical and Electronics Engineers Inc., 2019) Nirmal Kumar, A.; Talawar, B.
Recently, Networks-on-Chips (NoCs) have evolved as a scalable solution to traditional bus and point-to-point architecture. NoC design performance evaluation is largely based on simulation, which is extremely slow as the architecture size increases, and it gives little insight on how distinct design parameters impact the actual performance of the network. Simulation for optimization purposes is therefore very difficult to use. In this paper, we propose a Support Vector Regression(SVR)-based framework, which can be used to analyze the performance of 2D and 3D NoC architectures. Experiments were conducted by varying architecture sizes with different virtual channels, injection rates. The framework proposed can be used to obtain fast and accurate NoC performance estimates with a prediction error 2% to 4% and minimum speedup of 3000 Ã— to 3500Ã—. Â© 2019 IEEE.
UPM-NoC: Learning based framework to predict performance parameters of mesh architecture in on-chip networks
(Springer, 2020) Kumar, A.; Talawar, B.
Conventional Bus-based On-Chips are replaced by Packet-switched Network-on-Chip (NoC) as a large number of cores are contained on a single chip. Cycle accurate NoC simulators are essential tools in the earlier stages of design. Simulators which are cycle accurate performs gradually as the architecture size of NoC increases. NoC architectures need to be validated against discrete synthetic traffic patterns. The overall performance of NoC architecture depends on performance parameters like network latency, packet latency, flit latency, and hop count. Hence we propose a Unified Performance Model (UPM) to deliver precise measurements of NoC performance parameters. This framework is modeled using distinct Machine Learning (ML) regression algorithms to predict performance parameters of NoCs considering different synthetic traffic patterns. The UPM framework can be used to analyze the performance parameters of Mesh NoC architecture. Results obtained were compared against the widely used cycle accurate Booksim simulator. Experiments were conducted by varying topology size from 2Ã—2 to 50Ã—50 with different virtual channels, traffic patterns, and injection rates. The framework showed an approximate prediction error of 5% to 6% and overall minimum speedup of 3000Ã— to 3500Ã—. Â© Springer Nature Singapore Pte Ltd 2020.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results