Toward the High Precision Predictive Coding in Video Compression
- Author(s): Lin, Wei-Ting
- Advisor(s): Rose, Kenneth
- et al.
The main focus of this dissertation is on the optimal design of motion compensation scheme for predictive coding in video compression. The “sub-optimality” of conventional block-based motion compensation scheme, wherein the impact of a motion vector is confined within in a rigid rectangular block, motivates our design of a multi-hypothesis motion compensation scheme. We explicitly treat motion vectors as pointers to observation sources. Given the high-correlation between adjacent pixels in nature images, the motion vectors of neighboring blocks can point to relevant estimates of the current target. This dissertation work builds on this paradigm and demonstrates advanced techniques devised by incorporating the information of the entire motion vector field.
We first directly formulate the problem of motion compensation with multiple estimates as a linear estimation problem, and design a training method to derive the optimal linear coefficients to avoid overfitting as well as to minimize the ultimate reconstruction errors rather than prediction errors. As a single set of coefficients cannot capture the varying statistics of video sequences, we design K sets of coefficients which are trained off-line through “K-mode” iterative clustering techniques. By switching between the predefined sets of coefficients, the encoder can adapt to local statistics. This approach is then extended to the setting of variable block size partitioning to enjoy the substantial gain provided by the flexibility of dividing blocks to approximate object shapes. As the additional side information to indicate the set of prediction coefficients used to generate the final prediction is generally not negligible, a parametric framework is proposed to model the statistics of estimates and target pixels. The model leverages the first-order Markov property for image signals and relationships between motion vectors in the motion field. As a result, the coefficients derived from the model can automatically adapt to local variations without additional side information. Moreover, using the parametric approach, we can combine estimates from any number of motion vectors at any pixel location, which allows us to completely break free from the block structure by allowing a motion vector’s influence to be of arbitrary shape.
The reminder of this dissertation is focused on optimization on the AV1 encoder, the open source video encoder founded by the Alliance of Open Media (AOM). We first introduce a new coding tool to extend the number of reference frames, and then we use the existing coding tools to design a coding structure, wherein the reference frames are allocated to cover a wider temporal range to offer more diversities. This new design allows the encoder to better capture temporal variations. Finally, we introduce a complementary compound prediction mode, which is designed and optimized for the blocks where the existing compound modes fall short. Simulations provide experimental evidences for the efficiency of proposed mode with consistent coding gain across all bit-rate range.