UC San Diego
Improving end-user video quality through error concealment and packet importance modeling
- Author(s): Chang, Yueh-Lun
- et al.
During video transmission, congestion and distortion in the network will cause packet loss on the video content and degrade the video quality. Traditionally the degradation is measured by mean-squared error or peak- signal-to-noise-ratio. However, these measurements do not correlate well with human perception. In this dissertation, we aim to improve the visual quality for end-users through packet importance modeling and error concealment. The visual impact to end-users differs based on the type of the packet losses. We aim to predict how end-users respond to different losses. We start with an objective experiment in which Video Quality Metric scores are computed on fixed -sized IP packet losses for H.264/AVC SDTV video, and then construct a network-based model to predict these scores. We would like to further understand the real visual impact on the perceptual quality, so we conduct a human subjective experiment on whole frame losses concealed by different decoders. Whole B frame losses are introduced in H.264/AVC videos which are then decoded by two different decoders with different common concealment methods: frame copy and frame interpolation. The videos are seen by human observers who respond to each glitch they spot. It shows that even when there are more lost bits for a whole frame loss than a slice loss, the overall perceptual quality is often actually better due to the concealment that gives observers less spatial misalignment. We develop network- based models which can predict the visibility of whole frame losses. Based on the estimated visual importance, we can prioritize packets in lossy networks. The models are deployed in intermediate routers to prioritize video packets and perform intelligent frame dropping to relieve network congestion. Dropping frames based on their visual scores proves significantly superior to random dropping of B frames. Another key solution to reduce the visual impact of video packet losses is effective error concealment methods and thus this is another focus in this dissertation. Here we work on both traditional 2D video and 3D stereo video. Among formats that provide stereo effect, 2D+depth encoding for stereoscopic video is one of the most compatible with current video content transmission systems. Traditionally the 2D and depth streams are independently coded, transmitted and concealed separately if delivered through lossy networks. We propose a new encoding scheme that offsets the I frame between the 2D and depth sequences. When a loss happens in either one, they could be concealed by the information from the other, using the strong motion correlation. Besides providing error concealment by postprocessing at the end-user side, enhancing the error robustness of video from the encoder side is another approach. We propose an end-to-end distortion model for rate-distortion optimized coding mode selection of 2D+depth bitstreams. In our work, we first extend the encoding mode, adding an extra motion information sharing mode for the depth stream, and then improve the concealment methods. Based on these changes, we use the proposed end-to-end distortion model and derive a new Lagrange multiplier for rate-distortion optimized 2D +depth mode selection in packet-loss environments by taking account of the network conditions, i.e. the packet loss rate. Other than the stereo video format, a new video coding technology "High Efficiency Video Coding (HEVC)" has also been standardized in 2013. While achieving 50% bitrate reduction compared to prior standards with equal perceptual video quality, HEVC is more sensitive to packet losses since each bit contains more information. To alleviate this problem, we propose a motion-compensated error concealment method for HEVC and implement the method in reference software HM. The motion vector from the co- located block will be refined for motion compensation. Based on the reliability of these motion vectors (MVs), blocks will be merged and assigned new MVs. Our experimental result shows that not only the subjective visual quality performs well but also there is a substantial PSNR gain