HEVC video encoder overview

hevc video encoder overview

In this chapter, we give a review on HEVC video encoding while focusing on intra coding. HEVC, similar to previous video coding standards, is a hybrid video coding scheme with motion estimation and compensation for decreasing the temporal redundancy, and intra prediction and transform for decreasing the spatial redundancy. Also the last step in the encoding process is an entropy coding module for further compression. Although the main functions in HEVC are similar to prior video coding standards, most of them have evolved compared to those standards. The improvements are mainly in frame splitting into prediction and transform blocks, inter and intra mode decision, in-loop filtering and entropy coding. Before we describe the structure of the HEVC encoder, we give a brief review on general hybrid video coding to better understand the context of the study.

Block-Based Hybrid Video Coding

The compression efficiency of video encoders has greatly improved during last three decades. However, the main concepts of video coding are still the same. All main video coding standards have been based on a same concept of block-based hybrid video coding (Richardson, 2010). Hybrid video coding is based on redundancy removal in some consecutive steps. The term hybrid means using both predictive coding and transform coding. Each frame is partitioned into some rectangular regions called blocks which are coded by either inter or intra prediction. Inter prediction uses the previous frames to find the best prediction for the current block while intra prediction uses the information available in the current frame to make the prediction. The difference between the current block and its prediction forms the residual block. A transform is applied to the residual to remove spatial redundancy and to provide a more appropriate representation of the residual block for quantization. The quantization step provides a trade-off between compression ratio and visual quality. Then entropy coding is used as the final step to achieve the maximum possible compression.

HEVC Video Encoding

The architecture of HEVC is basically similar to the previous hybrid video coding standards such as H.264/AVC.The most important difference between HEVC and H.264/AVC is how the splitting of a frame into blocks is performed for achieving prediction and transform, which contributes to an improved coding efficiency. While the main coding block in H.264/AVC is a macroblock with size of 16×16, HEVC uses a more adaptive quadtree structure based on a block called coding tree unit (CTU) with maximum size of 64 × 64. This quadtree structure comprises some blocks and units. A block includes a rectangular area of samples and a unit is formed by a luma block and two related chroma blocks with associated syntax information. For example, a CTU consists of a luma coding tree block (CTB) and two chroma CTBs with syntax determining the further subdivision. As a result of subdivision, new units called CU are generated. The CU is a unit which is coded by inter or intra prediction. The CUs could be divided into prediction units (PUs) and transform units (TUs) for performing prediction and transform. PUs inside a CU can be predicted by different prediction modes. CUs, PUs and TUs consist of associated luma and chroma blocks called coding blocks (CBs), prediction blocks (PBs) and transform blocks (TBs) respectively. As it is obvious, this splitting is more adaptive relative to the approach used in H.264/AVC and is especially useful for higher resolution videos like 4K ×2K and 8K ×4K .

As it could be seen, there are many possibilities to split a picture into multiple units and blocks and there are also many ways to combine the coding tools. Although this may not have a significant impact on the decoder complexity, the encoder should perform heavy computations to leverage the full capabilities of the standard. This is mostly because the encoder should choose the best coding tree structure and the best way for the subdivision of a CU into PUs and TUs. This process can be performed by an exhaustive execution of RDO and is extremely time consuming. The RDO process considers all the encoding possibilities and compares them with regard to bit rate and picture quality. Since CU is the root for PU partitioning and the RQT configuration, it could be generally deduced that the computational complexity of RDO increases monotonically with the depth of the CU splitting (Ma et al., 2013). Limiting the CU depth can reduce the complexity but decreases the coding efficiency because small CUs can cope efficiently with the regions of the picture with complex texture while large CUs cannot successfully cover these areas. The RDO problem is a very important one in complexity analysis of HEVC and determining the best partitioning pattern of CTUs in a picture in a reasonable time can dramatically decrease the complexity burden of the HEVC encoder.

HEVC Intra Coding

HEVC intra coding follows the same structure as previous hybrid video codecs (Lainema et al., 2012). It is mainly based on spatial prediction and transform coding. However, it carries some new features, such as an increased number of prediction modes and a new frame splitting approach. The encoder uses spatial correlation in one frame to reduce the amount of data for transmitting or storing the visual information. To this end, it chooses the best coding unit depth, best transform unit tree and best mode for each prediction unit. An intra mode determines the direction to predict a block. HEVC intra coding supports 35 prediction modes for the luma component. There are 33 directional modes, allowing an efficient prediction of different directional video contents, a dc mode for homogeneous regions, and a planar mode to predict the smooth surfaces. Using the dc and planar modes let HEVC intra predict areas of the image which do not follow an edge model.

In the early versions of the standard the number of modes which could be used to predict a block was dependent on the size of PU and only a subset of modes could be selected. Now, however, for all sizes of PU, all 35 modes are examined. The decision for coding a block as intra is made at the CU level, but the intra mode is selected for each PU, and it is possible for PUs in the same CU to have different intra modes. After the intra mode is selected, the prediction is done for TUs inside the PU. This means that in determining the prediction signal, the spatially neighboring TUs are used. For a TU with size N ×N, there are 4N +1 samples for prediction from the above, above-right, above-left, left and below-left TUs. Samples from the below-left TU are not always available, and can only be used when they have been processed and decoded beforehand.

Table des matières

CHAPTER 1 INTRODUCTION
1.1 Problem Statement and Motivations
1.2 Objectives
1.3 Contributions
1.4 Thesis Structure
CHAPTER 2 HEVC VIDEO ENCODER OVERVIEW
2.1 Block-Based Hybrid Video Coding
2.2 HEVC Video Encoding
2.3 HEVC Computational Complexity Analysis
2.4 HEVC Intra Coding
CHAPTER 3 LITERATURE REVIEW ON HEVC INTRA CODING
3.1 Mode Decision Complexity Reduction
3.2 Coding Unit Size Decision Complexity Reduction
CHAPTER 4 PROPOSED INTRA MODE DECISION METHODS
4.1 Mode Decision Based on RDO Cost Prediction
4.1.1 RDO Cost Modeling
4.1.2 Candidates Selection
4.1.3 Experimental Results and Discussion
4.2 Fast Chroma Mode Decision
4.2.1 Proposed Method
4.2.2 Experimental Results and Discussion
4.3 Low Complexity Edge Detection for Mode Decision
4.3.1 Proposed Method
4.3.2 Experimental Results and Discussion
4.4 Fast Mode Decision Based on SATD Cost Classification
4.4.1 Most Relevant Modes of the Neighboring Blocks
4.4.2 Mode Ordering, Binary Classification and RDO Dodging
4.4.3 Experimental Results and Discussion
4.5 Experimental Results and Discussion for Overall Mode Decision
CHAPTER 5 PROPOSED CODING UNIT SIZE DECISION METHODS
5.1 Early Splitting Termination Based on Global and Directional Gradients
5.1.1 Early Splitting Termination by Global Gradient
5.1.2 Early Splitting Termination by Directional Gradient
5.1.3 Experimental Results and Discussion
5.2 Early Splitting and Early Splitting Termination Based on Bayesian Analysis
5.2.1 Proposed Method
5.2.2 Experimental Results and Discussion
5.3 CU Size Decision Using Neural Network-Based Reinforcement Learning
5.3.1 Problem Formulation
5.3.2 State Representation and Action Space
5.3.3 Cost Function
5.3.4 Experimental Results and Discussion
CONCLUSION