Authors: Sunil B, Vijaya Prakash A M
Certificate: View Certificate
High Efficiency Video Coding (HEVC) achieves improved compression efficiency, but it introduces higher computational complexity due to intricate partitioning and increased angular modes in intra prediction. In this study, we propose a hardware architecture for handling the direct current (DC) and planar modes of intra prediction in the HEVC standard. To evaluate our proposed architecture, we performed synthesis using both TSMC 180nm and TSMC 90nm technologies. We later conducted physical implementation using Physical Cell 90nm technology. The results show significant improvements when moving from TSMC 180nm to TSMC 90nm technology, with the chip area reducing by approximately three times and power consumption decreasing by over six times. After completing the physical layout phase, we obtained a chip area of 112.19 mm2 and a power consumption of 3.88 mW. Comparing the synthesis results with the physical design phase, we observed a slight increase in chip area by around 1.5 times, while the power consumption decreased by approximately 0.4 times. The proposed architecture achieves a throughput of 27 pixels per clock cycle and supports a block size of 16x16, with the potential for further extension.
In the realm of information theory, data compression, also known as source coding or bit-rate reduction, refers to the process of encoding information using a smaller number of bits compared to its original representation. This reduction in data size is commonly known as data compression. In the context of data transmission, it is referred to as source coding as the encoding takes place at the data source before storage or transmission. Data compression provides several advantages by reducing the resources necessary for data storage and transmission. However, the compression and decompression processes require computational resources. Achieving effective data compression involves finding a balance between memory usage and processing time, resulting in a trade-off between the two factors.
Digital video data contains significant redundancy, which makes it well-suited for compression techniques. These techniques effectively address the challenges associated with large video file sizes. Lossy compression stands out among the other compression techniques because it provides higher compression ratios for video data. It is important to remember, though, that when the compression ratio rises and the file size decreases, the video quality could suffer.
HEVC is a video compression standard designed to offer improved performance compared to previous standards like H.264/AVC. The video compression process in HEVC involves employing an HEVC video encoder to encode a series of video frames, which constitute the source video. This encoding generates a compressed video bitstream. The compressed bitstream can be transmitted or stored efficiently. When the compressed video is received, a video decoder is employed to decompress the bitstream, reconstructing the original sequence of decoded frames. Through the utilization of HEVC's advanced compression techniques, video data can be effectively transmitted, stored, and decoded while maintaining a higher video quality level.
Fig.1. shows the HEVC block diagram. It consists of the following stages: The video input stage involves capturing or retrieving the source video frames, which form the basis for compression. The encoding stage utilizes an HEVC video encoder to compress the video frames. This involves various techniques such as partitioning, intra-prediction or inter-prediction, transform coding, quantization, and entropy coding. Intra- prediction predicts pixel values within a frame, inter-prediction utilizes information from previous frames and transform coding convert’s spatial information into frequency-domain coefficients. After encoding, the compressed video data is organized into a compressed bitstream. This bitstream contains encoded information representing the compressed frames. The decoding stage involves using an HEVC video decoder to reconstruct the original video frames from the compressed bitstream. This includes inverse processes such as entropy decoding, inverse quantization, inverse transform coding, and motion compensation. The reconstructed video frames are then presented as output, which can be displayed on a screen or used for further processing.
Fig. 2. Depicts a partitioned picture with a highlighted slice. This particular slice consists of multiple Coding Tree Units (CTUs), each measuring 64x64 pixels. Within each 64x64 CTU, there is the possibility of dividing it into four smaller Coding Units (CU) sized 32x32 pixels. Further subdivision is also possible, where each 32x32 CU can be split into 16x16 CUs, and subsequently into 8x8 CUs. Each CU can be divided into Prediction Units (PU) and Transform Units (TU).
TUs carry the residual signal, representing the difference between the original pixel values and the predicted values. PUs and TUs can vary in size, ranging from 4x4 to 32x32 pixels, as shown. All the CUs, PUs, and TUs contain corresponding luma (Y-component) and chroma (Cb and Cr components) blocks, along with associated syntax elements similar to those found in the CTU.
HEVC incorporates two main categories of intra prediction methods. The first category encompasses angular prediction methods, which enable the codec to accurately represent various directional structures commonly found in images. These methods facilitate precise modeling of directional patterns within the picture, leading to improved prediction accuracy. The second category comprises planar prediction and DC (Direct Current) prediction methods. These techniques aim to provide predictions for image areas that exhibit smooth and gradual content changes. Planar prediction estimates pixel values based on the assumption of a flat plane, while DC prediction predicts pixel values based on the average value of neighboring pixels. These methods are particularly effective in capturing areas with consistent and gradual variations in image content. HEVC supports a total of 35 intra prediction modes, which combine both angular prediction methods and planar/DC prediction methods. This wide range of modes provides diverse options for accurately predicting pixel values within a given block or unit of the image. By leveraging these prediction modes, HEVC achieves improved compression efficiency while preserving image quality.
II. RELATED WORK
Proposed two architecture,  Parallel Pipelined Architecture (PPA) and Parallel Datapath Architecture (PDA) for DC and planar modes of intra prediction in HEVC. For lower block size like block size 4, PDA consumed fewer resources. But for higher block size, consumed fewer resource.  Proposed an algorithm for HEVC using high-level synthesis (HLS) designing method.  Proposed three different architecture for intra prediction in HEVC. Fully Sequential Architecture (FSA), Semi Parallel Architecture (SPA) and Fully Parallel Architecture (FPA). FSA utilized the least resources among the three, and FPA had significantly faster processing time among the three.  Proposed a computationally scalable algorithm and its architecture. This algorithm and architecture enable efficient and effective intra coding for high-resolution videos, providing scalability in computational performance to handle the demanding requirements of encoding at such resolutions.
Proposed an architecture  which introducing a novel buffer structure designed specifically for reference samples. This buffer structure enhances the handling and management of reference samples, improving the overall efficiency and effectiveness of the encoding process. Implementing a mode-dependent scanning order. This scanning order is tailored to the specific encoding modes utilized, optimizing the arrangement of data and improving the compression efficiency. Introducing an inverse method for extending reference samples. This method allows for accurate and effective extension of reference samples, enabling more precise predictions and enhancing the overall quality of the encoded video.
The focus of  is on the VLSI implementation of Discrete Cosine Transform (DCT) algorithms specifically designed for HEVC applications. The research emphasizes the hardware realization of efficient DCT algorithms tailored to meet the requirements of HEVC video coding. By focusing on VLSI implementation, the study aims to develop optimized hardware architectures for DCT that can be seamlessly integrated into HEVC systems, enhancing their overall performance and efficiency.
Thoroughly examines the data dependency in HEVC  intra-mode decision and proposes a set of fast algorithms aimed at eliminating data dependency and reducing computational complexity. These algorithms include the Rough Mode Decision based on the source signal, a coarse to fine rough mode search, Prediction Mode Interlaced RDO mode decision, parallelized context adaption, and Chroma-free Coding Unit (CU)/Prediction Unit (PU) decision. In order to increase throughput, the research also suggests a parallelized VLSI architecture that uses scheduling for CU reordering and Chroma reordering. By leveraging these architectural optimizations, the proposed solution demonstrates improved efficiency and performance in the context of HEVC intra-mode decision making.
The architecture presented in  takes into account all the innovative features of HEVC Intra Prediction, including all modes and Prediction Unit (PU) sizes. Performance and memory access pose challenges in HEVC intra prediction, and hardware architecture designs offer promising solutions, particularly for achieving energy-efficient implementations. To address these challenges, the designed architecture incorporates buffers and internal memories, effectively reducing the reliance on external memory accesses. Additionally, the architecture features two independent data paths capable of processing eight samples in parallel. This parallel processing capability, combined with a deep and multiplierless pipeline, significantly increases the throughput of the system.
III. PROPOSED METHODOLOGY
Fig. 3. Illustrates the proposed high-performance architecture for the DC and planar modes of intra prediction in HEVC. The architecture is divided into two main parts.
The first part involves the extraction of pixel data from an input image, and this functionality is implemented using MATLAB. The second part of the architecture is dedicated to performing the DC and planar prediction using a reference buffer. This portion of the architecture is designed using Verilog, which is a hardware description language (HDL) commonly employed for the design of digital systems. By utilizing MATLAB for pixel data extraction and Verilog for DC and planar prediction, the proposed architecture leverages the strengths of both software and hardware design methodologies to achieve a high-performance solution for intra prediction in the DC and planar modes of HEVC.
A. Image Conversion into Input Data for Prediction
The architecture of the system takes an input image as the initial source of data, which is used for predicting the values of DC and planar modes in HEVC. In HEVC, it is common to utilize the 4:2:0 chroma subsampled YCbCr color space for the input image. Initially, the input image is in the RGB color space. To convert the image to the YCbCr color space, each RGB pixel undergoes a transformation to obtain its corresponding YCbCr values. This RGB to YCbCr conversion results in an image consisting of three color components: Y (luma), Cb (blue-difference chroma), and Cr (red-difference chroma), with each component having full resolution. In HEVC and many other video codecs, a common practice is to apply 4:2:0 chroma subsampling. Indicating that for every 4 Y samples, there are 2 Cb and 2 Cr samples. The Y component (luma) remains unchanged and retains its full resolution, while the Cb and Cr components undergo subsampling. After the chroma subsampling process, the Y, Cb, and Cr components are separated into individual image planes, each representing a specific color component. By following this color space conversion and chroma subsampling procedure, the architecture prepares the input image in the appropriate format for further processing and prediction in the DC and planar modes of HEVC.
Different prediction modes and block sizes dictate the positioning of the reference samples relative to the current pixel. Typically, the reference samples can be located above, to the left, or diagonally above-left of the current pixel, among other possible positions, depending on the specific prediction mode being employed. By incorporating the appropriate reference samples from the reference buffer, the intra prediction algorithm can make informed estimations of pixel values, enhancing the efficiency and accuracy of compression in HEVC.
C. Planar Prediction
Planar prediction is a method used in the HEVC intra prediction process. It assumes that the pixel values within a block exhibit a linear relationship in both the horizontal and vertical directions. This prediction is achieved by combining the horizontal prediction and vertical prediction using weighted averaging, based on the position of the pixel being predicted. The planar prediction, denoted as PPre[m][n], is calculated using equation (1). The horizontal prediction, HPre[m][n], determined by equation (2), and the vertical prediction, VPre[m][n], determined by equation (3). The block size, represented by N, can have values of 4, 8, 16, 32, or 64.
IV. RESULTS AND DISCUSSION
The proposed architecture for the DC and planar modes of intra prediction in HEVC is implemented using Verilog HDL, and its functionality is verified through simulation using the Cadence SimVision tool. The architecture is designed to handle the Y luma component with a block size of 16x16, while the Cb and Cr chroma components, which undergo 4:2:0 chroma subsampling, have a block size of 8x8. To evaluate the performance of the architecture, synthesis is conducted using the Cadence Genus Synthesis Solution, utilizing both TSMC 180nm and TSMC 90nm technologies. Additionally, the physical implementation of the architecture is carried out using the Cadence Innovus Implementation System, specifically targeting the physical cell 90nm technology.
Fig. 7, fig. 9 and fig. 10. shows the planar prediction output for Y luma, Cb and Cr chroma components respectively at the RTL design stage. Fig. 8. shows the DC prediction output for Y luma component at RTL design stage. After synthesizing using TSMC 180nm technology the architecture consumed a chip area of 246.41mm2 and power consumption of 27.86mW. The gate count is 24,692. After synthesizing using TSMC 90nm technology the architecture consumed a chip area of 75.97mm2 and power consumption of 5.1 mW. The gate count is 20,075.
In this project, a high-performance architecture for intra prediction in HEVC was designed and synthesized using Cadence Genus Synthesis Solution. The architecture supports both DC and planar modes and operates at a throughput of 27 pixels per clock cycle. It takes Y luma and Cb, Cr chroma reference buffers as input and can predict all 256 pixel values in a 16x16 block for the Y component in 16 clock cycles. The synthesis results indicate that transitioning from TSMC 180nm to TSMC 90nm technology significantly reduces the chip area by almost three times and power consumption by more than six times. However, after the physical design phase, the chip area increases by approximately 1.5 times, primarily due to the net area introduced during the routing phase. On the other hand, the power consumption decreases by around 0.4 times. Ultimately, the physical design of the architecture results in a chip area of 112.19 mm2 and a power consumption of 3.88 mW. These values reflect the trade-off between area and power achieved during the design process.
 Lakshmi and P. Aparna, “Efficient Architectures for Planar and DC modes of Intra Prediction in HEVC”, IEEE 7th Int. Conference on Signal Processing and Integrated Networks, Noida, India, 2020.  A. B. Atitallah and M. Kammoun, “High-level design of HEVC intra prediction algorithm”, 5th International Conference on Advanced Technologies for Signal and Image Processing, Sousse, Tunisia, 2020.  S. Shwetha, Lakshmi and P. Aparna, “Complexity Analysis of Hardware Architectures for Intra Prediction unit of High Efficiency VideoCoding (HEVC)” IEEE International Conference on Electronics, Computing and Communication Technologies, Bangalore, India, 2020.  G. Pastuszak and A. Abramowski, “Algorithm and architecture design of the H.265/HEVC intra encoder,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 210–222, Jan 2016.  B. Min, Z. Xu and R. C. C. Cheung, “A Fully Pipelined Hardware Architecture for Intra Prediction of HEVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 12, 2017.  P. T. Vanishree and A. M. Vijaya Prakash, “VLSI Implementation of Discrete Cosine Transform and Intra prediction”, IEEE International Conference on Advances in Electronics, Computers and Communications, Bangalore, India, 2014.  X. Huang, H. Jia, B. Cai, C. Zhu, J. Liu, M. Yang, D. Xie and W. Gao, “Fast algorithms and VLSI architecture design for HEVC intra-mode decision”, Springer, Journal of Real-Time Image Processing, 2016.  V. Sze, M. Budagavi, and G. J. Sullivan, High efficiency video coding (HEVC). Berlin, Germany:Springer-Verlag, Jul. 2014.  G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, 2012.  D. Palomino, F. Sampaio, L. Agostini, S. Bampi, and A. Susin, “A memory aware and multiplierless VLSI architecture for the complete Intra Prediction of the HEVC emerging standard”, IEEE Int. Conf. on Image Processing, pp, 2012.  M. Abeydeera, M. Karunaratne, G. Karunaratne, “4K Real-Time HEVC Decoder on an FPGA”, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 26, Issue: 1, 2016.  D. Patel, T. Lad and D. Shah, “Review on Intra-prediction in High Efficiency Video Coding (HEVC) Standard”, International Journal of Computer Applications, Volume 132 – No.13, 2015.  T. Nguyen, D. Marpe, “Performance Analysis of HEVC-Based Intra Coding for Still Image Compression”, 2012 Picture Coding Symposium, Krakow, Poland, 2012.  M. Perleberg, V. Borges, V. Afonso, “6WR: A Hardware Friendly 3D-HEVC DMM-1 Algorithm and Its Energy-Aware and High-Throughput Design”, IEEE Trans. on Circuits and Systems II, 2020.  K. Singh, S. R. Ahamed, “Scalable VLSI Architecture for Hadamard Transforms of HEVC/H.265 Video Coding Standard”, 2020 24th International Symposium on VLSI Design and Test , Bhubaneswar, India, 2020.
Copyright © 2023 Sunil B, Vijaya Prakash A M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.