FPGA based Simulation Acceleration of on-Chip Networks
Date
2021
Authors
Khyamling
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
As the number of processing cores in the Systems-on-Chip(SoC) increases, the traditional
bus based interconnect will be the major bottleneck to achieving the performance
required by modern applications. Further, bus based communication may not
provide the required bandwidth and latency to the systems with intensive parallel communication.
An efficient interconnection architecture is required to achieve high performance
and scalability in many-cores SoC. The Network-on-Chip(NoC) architecture has
emerged as the most promising interconnection architecture for the modern Chip Multiprocessor(
CMP) and Multi/Many-Processor System-on-Chip(MPSoC) systems. The
components in these systems, the cores, accelerators, memory blocks, and peripherals
are interconnected using one or more NoCs composed of links and routers. The choice
of router parameters and NoC topologies can have a significant impact on the overall
performance of heterogeneous many-core systems.
The evaluation methodologies of NoCs for future computing systems with a large number
of interconnected components rely heavily on analytical models and simulations.
The fast modeling of large scale NoCs have been done through analytical models with
significant inaccuracy. Fast and flexible NoC simulator frameworks are needed for modeling
the large scale NoC based heterogeneous many-core systems, which can deliver a
high level of accuracy.
Detailed software simulators used for design space exploration of NoCs, provide better
accuracy than analytical modelings. However, software simulators are slow when simulating
large scale NoCs for interconnection of various components.
This thesis presents the optimization of software based NoC simulator and a Field programmable
gate arrays(FPGA) based NoC simulation acceleration framework to address
the issue of simulation speed, accuracy, and flexibility. Initial work in the thesis
involves profiling of the Booksim2.0 software simulator, as it is used extensively for
the design and evaluation of NoC architectures. The Booksim2.0 is profiled with the
various NoC design parameters and memory configurations to analyze its performance.
The performance analysis of Booksim2.0 is based on cache misses, memory usage, and
hotspots. Profiling helped in applying focussed software optimization techniques on the
simulator. Further, Booksim2.0 was parallelized using OpenMP and SIMD constructs
to improve its overall performance.
Going beyond software optimization, an FPGA based NoC simulation acceleration
framework called YaNoC is proposed to explore the impact of microarchitectural parameters
on the performance of the NoC. YaNoC supports for design space exploration
of custom topologies with custom routing algorithm along with standard minimal routing
algorithm for conventional NoCs. The YaNoC is used to study NoC architectures
of a CMP using various traffic patterns, the results show that the YaNoC utilize fewer
FPGA resources and is faster than the other state-of-art FPGA based NoC simulation
acceleration platforms.
The next challenge was to optimize the resources consumed by YaNoC. The FPGA
fabric provides hard resources such as Block RAM(BRAM) and DSP48E1 units along
with specialized interconnect. Most of the state-of-art FPGA based simulators utilize
soft logic only for modeling the NoCs, leaving out the hard blocks to be unutilized. The
Input buffer and crossbar functionality of NoC routers embed onto the hard block of
Xilinx BRAM and DSP48E1 units thereby reducing the dependence on soft logic. A
pure configurable logic block implementation and a hard block based implementation
of the NoC router exhibit identical latency and performance behaviour. The utilization
of hard units for the design of NoCs results in high performance with low cost design
compared to state-of-art frameworks.
Next, the design of an FPGA based parameterized framework called P-NoC with configurable
Topology, Router and Traffic modules for performance evaluation and design
space exploration has been presented. The P-NoC enables the designer to choose from
a variety of architectural parameters like Input buffers, Virtual Channels, routing algorithms,
traffic patterns, topology for exploration of NoC design. The P-NoC also
supports a flexible communication model and traffic generation.
In the last piece of work, an FPGA based NoC using a low latency router with a look
ahead bypass(LBNoC) has been proposed. The LBNoC design targets the optimized
ii
area with improved network performance. The techniques such as a single-cycle router
bypass, adaptive routing module, parallel Virtual Channel (VC), and Switch allocation,
combined virtual cut through and wormhole switching, have been employed in the designing
optimized LBNoC router. The LBNoC architecture consumes fewer hardware
resources, reduction in average packet latency and gain in speedup than the state-of-art
NoC architectures.
Description
Keywords
Department of Computer Science & Engineering, Network-on-chip (NoC), Field Programmable Gate Arrays (FPGAs), Simulation framework, Simulation Acceleration, Performance Analysis, DSP48E1, Block RAM, Adaptive Routing