Design of Power and Performance Optimal 3D-NoC Architectures
Date
2020
Authors
Halavar, Bheemappa.
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
A highly structured and efficient on-chip communication network is required to achieve
high-performance and scalability in current Chip Multiprocessors (CMPs) and Systemon-Chips (SoCs). Network-on-Chip (NoC) has emerged as a reliable communication
framework in CMPs and SoCs. Many 2D NoC architectures have been proposed for
the efficient design of on-chip communication. 2D NoC architectures suffer from high
latency and high energy in read/write buffers, Virtual Channels, switch traversal,
links (wires) as the number of cores in SoC ad CMPs increase. 3-Dimensional Integrated Chips (3D-ICs) serve emerging applications that demand tailored accelerators
for high performance and improved energy efficiency. The component redistribution in
3D ICs enables higher performance at competitive energy budgets by allowing greater
integration capabilities, while lowering the overall wire area, providing greater communication bandwidth, high flexibility, throughput and lower overall communication
latencies.
Cycle accurate simulators model the functionality and behaviour of NoCs by considering microarchitectural parameters of the underlying components to estimate performance, power and energy characteristics. Employing NoCs in 3D-ICs can further
improve performance, energy efficiency, and scalability characteristics of 3D SoCs
and CMPs. Minimal error in the estimation of energy and performance of NoC components is crucial in architectural trade-off studies. Exploring design space in 3D
NoC can lead to highly energy efficient and reduced area interconnect architecture for
modern SoC. Accurate modeling of horizontal and vertical links by considering microarchitectural and physical characteristics reduces the error in power and performance
estimation of 3D NoCs. Additionally, mapping the temperature distribution in a 3D
NoC reduces estimation error. Effective extraction of the heat between layers is a
significant challenge in 3D NoCs.
iIn this thesis, power and performance trade-off in two, 2-layer 3D Butterfly Fat
Tree (BFT) variants are explored using a floorplan driven approach. The first 3D
BFT variant analyzed is a standard stacked BFT (3DBFT) derived from a 2D BFT
topology. A power-performance optimal 3D BFT (OP3DBFT) is evolved from the
standard 3DBFT using overall performance, link and TSV minimization, and powerperformance trade-offs. The 3D NoC modeling capabilities are extended in two
existing state-of-the-art simulators, viz., the 2D NoC Simulator - BookSim2.0 and
the thermal behaviour simulator - HotSpot6.0.The major extensions incorporated in
BookSim2.0 are: Through Silicon Via power and performance models, 3D topology construction modules, 3D Mesh topology construction using variable X, Y, Z
radix, tailored routing modules for 3D NoCs. The major extensions incorporated
in HotSpot6.0 are: parameterized 2D router floorplan, 3D router floorplan including
Through Silicon Vias (TSVs), power and thermal distribution models of 2D and 3D
routers. Using the extended 3D modules, performance (average network latency), and
energy efficiency metrics (Joules per Flit, Energy-Delay Product) of variants of 2D
and 3D Mesh, and Butterfly Fat Tree (BFT) topologies have been evaluated under
synthetic traffic patterns. The thermal behaviour of 3D NoC architectures has been
analyzed for the ideal arrangement, as well as a proposed thermally aware design of
the router-TSV element. Accurate power estimation models of routers and TSVs were
used for the thermal evaluation of 3D NoCs.
The OP3DBFT with round-robin deflection routing delivers up to 44% higher performance and consumes up to 23% lesser power compared to the 3DBFT. From the
energy perspective, OP3DBFT has an average 23% decrease in Flits-per-Joule, and
up to 46% improvement in Energy-Delay-Product when compared to the 3DBFT.
The 3DBFT and OP3DBFT have been synthesized on Xilinx Artix-7 FPGAs for resource comparison. OP3DBFT consumes 12% lesser area compared to 3DBFT. Using
extended models in a 4x4x4 3D NoC Mesh topology, it has been observed that the
total average link power consumed is lower than a 2D mesh by 13%. Additionally, the
average network latency in the 3D mesh topology is roughly 60%-82% lower than the
2D Mesh. 4-layer 3D Mesh with uniform traffic exhibits a performance improvement
of up to 2.3× compared to other Mesh variants. 4-layer 3D BFT with transpose traffic
shows an improvement in performance up to 1.3× over all other BFT variants. BFT
iiwith transpose traffic pattern has a 1.5× improvement in performance compared to
the uniform traffic pattern. 4-layer 3D Mesh has on-chip communication performance
up to 4.5×than 4-layer 3D BFT. The on-chip communication performance improved
up to 2.2× and 3.1× in 4-layer 3D Mesh in comparison to 2D Mesh with uniform
and transpose traffic patterns respectively. 3D Mesh variants have the lowest Energy
Delay Product (EDP) compared to 3D BFT variants as there is an 80% reduction in
link lengths and up to 3× more TSVs.
Description
Keywords
Department of Computer Science & Engineering, 3D Network-on-chip (NoC), BFT topology, Mesh topology, Throughsilicon via (TSV), Design space exploration, performance analysis, Energy Delay Product