Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
3 results
Search Results
Item A Mixed Parallel and Pipelined Efficient Architecture for Intra Prediction Scheme in HEVC(Taylor and Francis Ltd., 2022) Poola, L.; Aparna., P.The complexity of intra prediction in High-Efficiency Video Coding (HEVC) is increased significantly due to the incorporation of inherent features like variable-sized quadtree partitioned coding units and 35 angular modes that help in achieving better compression. This paper presents an efficient hardware architecture for the intra prediction that supports and comprises the above aspects and achieves a higher throughput to support high definition (HD) videos. A compact reusable reference buffer structure is implemented to limit the buffer size to 1 KB. A dedicated arithmetic unit to take advantage of the parallelism present in the prediction algorithm is incorporated, which allows the reuse of multipliers to reduce hardware resources. The loading of reference samples to buffers for prediction causes significant delays which are eliminated in our design. The entire architecture functions as a pipelined unit with no data dependency and generates eight samples/clock cycle in parallel. The design is implemented on a Field Programmable Gate Array (FPGA) platform operating at a frequency of 110 MHz. This makes it possible to support 4 K videos at 30 frames per second, with the resource cost of 16 K logic gates and 122 registers. © 2022 IETE.Item An efficient parallel-pipelined intra prediction architecture to support DCT/DST engine of HEVC encoder(Springer Science and Business Media Deutschland GmbH, 2022) Poola, L.; Aparna., P.The complexity of intra prediction in high-efficiency video coding (HEVC) is increased due to the addition of five variable sized prediction units (PUs) and 35 directional predictions. In this work, we propose an efficient parallel-pipelined architecture that can process 8 samples in parallel for every clock cycle. The functional units needed to predict the PU samples work in a pipelined fashion. With this balanced combination of parallel-pipelined structure, we are able to achieve higher throughput with limited hardware resources than existing literature works. The samples are processed row-wise, so that they can be directly transform coded, thus eliminating the need for an intermediate memory buffer of 8 K between the two modules. A compact reconfigurable reference buffer of size 0.8 KB is incorporated to reduce the read-write latency associated with reference samples’ fetching. A dedicated module for arithmetic operations is used in the intra engine that ensures the reuse of multipliers to increase the hardware efficiency. The architecture so designed supports all the PU sizes and directional modes. The proposed design is tested and implemented on a field-programmable gate array (FPGA) platform operating at 150 MHz frequency to achieve 8 samples throughput with a hardware cost of 16.2 K Look-Up Tables (LUTs) and 5.7 K registers to support HD 4 K real-time video encoding applications. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.Item Hardware Efficient Integrated In-loop Filter for HEVC Encoder(Taylor and Francis Ltd., 2024) Poola, L.; Aparna., P.The deblocking filter (DF) and the sample adaptive offset (SAO) filter, which aids in enhancing the subjective quality of the image, make up the in-loop filter of the high-efficiency video coding (HEVC) encoder and decoder. The in-loop filter significantly increases the computational load on the HEVC encoder. It is challenging to design an in-loop filter on hardware that can handle intensive computations while using the least amount of on-chip memory, taking external memory traffic and dependencies simultaneously delivering high throughput to support Ultra HD video applications. The proposed design employs the following strategies to address these issues. This work proposes an address generation technique for pipelined horizontal and vertical filtering in DF, that avoids a transpose buffer which otherwise is required. This enables easy pipelining and parallelization thus improving throughput while reducing the on-chip memory utilization. A simplified SAO filter with parallel-pipelined processing is included in the design. These features enable the design to support ultra-HD 7680 (Formula presented.) 4320 @ 40 fps video applications. The proposed hardware architecture has a total gate count of 7.73 K LUTs and 2.8 K slice registers, and it is implemented on a 28 nm field programmable gate array (FPGA) platform. © 2024 IETE.
