Hardware-based Acceleration of Network-on-Chip Simulation using FPGAs
Date
2021
Authors
M, Prabhu Prasad B.
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Replacing the conventional bus-based architectures, Network-on-Chip (NoC) has become
a tangible on-chip communication framework in the many-core processors, Chip
Multi-Processors (CMPs), and Multi-Processor System-on-Chips (MPSoCs). Also,
NoCs have become an integral part of the heterogeneous systems with applicationspecific
accelerators such as databases, graph processing, and deep neural networks.
In these heterogeneous systems, it is the responsibility of NoCs to interconnect various
components. More number of cores are being incorporated in state-of-the-art homogeneous
and heterogeneous multi-core processors to achieve high performance and better
power efficiency. Likewise, to achieve high performance in the target applications, various
components such as processing cores, input/output peripherals, and memory components
being integrated on heterogeneous systems are also increasing. When there is
an increase in the number of interconnected components, the performance of the target
application becomes highly dependent on the performance of NoC. Hence, there is
a need to model and evaluate large NoC designs quickly and accurately as thousands
of cores are targeted in the near future multi-core architectures due to the advances
in CMOS technology. NoC modeling helps understand the impact of various design
parameters on the overall system and the performance characteristics.
A crucial hurdle in the design and evaluation of large-scale NoC is the lack of rapid
methodologies for modeling, which can deliver a high level of accuracy. Analytical
models compromise accuracy to achieve results in a short period of time. Hence, to perform
the design space exploration of NoCs, designers frequently employ the software
simulators. The software simulators provide better accuracy than analytical modeling.
When a large-scale NoC with a huge number of nodes is being simulated, the software
simulators tend to become too slow. To address the issue of simulation speed, an Field
Programmable Gate Arrays (FPGA) based NoC simulation acceleration framework has
been proposed in this thesis. A fully parameterized FPGA based NoC simulation framework
called YaNoC has been proposed. YaNoC supports the design space exploration
of various NoC topologies considering a rich set of router micro-architectural parameters.
To simulate the larger topologies, the hard blocks of the FPGA, such as Block
RAMs (BRAMs) and DSP blocks, have been employed to map the NoC router components
such as FIFO buffers and the crossbar, respectively. Further, a lightweight NoC
router architecture has been proposed to reduce the area utilization and improve network
performance.
The thesis’s initial work employs profiling to analyze the performance of the Booksim2.0
NoC software simulator with various design decision parameters and memory
configurations. Various cache design parameters such as cache size, block size, and
associativity have been considered to simulate the NoC topologies of Booksim2.0 to
observe the effect of cache configurations. The hotspots of the Booksim2.0 simulator
are identified, and software optimizations are employed to improve the performance of
the Booksim2.0. To reduce the execution time of Booksim2.0, optimization methodologies
such as vectorization and thread parallelization are employed. The OpenMP
programming model is used for parallelizing and vectorizing the source code of Booksim2.0.
Due to high synchronization cost, the gain achieved in simulation speed is not significant.
Higher simulation speed can be achieved by sacrificing the simulation accuracy
to mitigate the complexity of synchronizations. FPGA-based simulators are becoming
a promising approach for enhancing the speed of simulations. An FPGA-based NoC
simulation acceleration framework called YaNoC, supporting design space exploration
of standard and custom NoC topologies considering a full set of NoC router microarchitectural
parameters, has been proposed. YaNoC is capable of designing custom
routing algorithms, various traffic patterns. Obtained results show that the YaNoC consumes
fewer hardware resources and is faster than the other FPGA based NoC simulation
acceleration platforms.
Most of the state-of-the-art FPGA based simulators utilize soft logic only for modeling
the NoCs, leaving out the hard blocks unutilized. The FPGA soft logic resources
become a limiting factor when simulating a large NoC topology. Multiple FPGAs with
off-chip memory can be employed to overcome the limitation of the FPGA resources.
ii
The entire system becomes more complex and slow by using these approaches, leading
to a reduction in the system’s performance. Instead of having a multi-FPGA setup to
simulate larger topologies, the hard blocks of an FPGA have been utilized efficiently
to map the NoC router components. The functionality of the NoC router’s buffer and
crossbar switch are embedded in the BRAMs and the wide multiplexers of the DSP48E1
slices. A substantial decrease in the Configurable Logic Blocks (CLBs) utilization of
NoC topologies on the FPGA is observed by embedding the functionality of the buffers
and crossbar on the hard blocks of the FPGA compared to other state-of-the-art works.
Lightweight and high-performance NoC architecture is suitable for designing the
heterogeneous systems to achieve area reduction and to improve the overall system
performance. A low latency router with a look-ahead bypass called LBNoC has been
proposed. The techniques such as single cycle router pipeline bypass, adaptive routing
module, parallel virtual channel and switch allocation, combined flow control mechanism
like virtual cut through, and wormhole switching are employed in designing the
LBNoC router. The input buffer modules of NoC router are mapped on the FPGA
BRAM hard blocks to utilize resources efficiently.
Description
Keywords
Department of Computer Science & Engineering, Network-on-Chip, NoC, FPGA, Simulation acceleration, Performance analysis, DSP48E1, BRAM