Signal Coding Approaches for Spatial Audio and Unreliable Networks
- Author(s): Zamani, Sina
- Advisor(s): Rose, Kenneth
- et al.
This dissertation is divided into two parts. The first part is concerned with developing algorithms for the compression of emerging 3D audio format, while the second part investigates optimization techniques for error-resilient predictive compression systems design.
In the first part, advances in development of compression algorithms for higher order ambisonics (HOA) data is presented. HOA has proven to be the method of choice in virtual reality applications, given its capability in reproducing spatial audio and its rendering flexibility. Recent standardization for HOA compression adopted a framework wherein HOA data are decomposed into principal components that are then encoded by standard audio coding, i.e., frequency domain quantization and entropy coding to exploit psychoacoustic redundancy. A noted shortcoming of this approach is the occasional mismatch in principal components across blocks, and the resulting suboptimal transitions in the data fed to the audio coder. In this dissertation, we propose a framework where singular value decomposition (SVD) is performed after transformation to the frequency domain via the modified discrete cosine transform (MDCT). This framework not only ensures smooth transition across blocks, but also enables frequency dependent SVD for better energy compaction. Moreover, we introduce a novel noise substitution technique to compensate for suppressed ambient energy in discarded higher order ambisonics channels, which significantly enhances the perceptual quality of the reconstructed HOA signal. In the next step, to reduce the burden of side information, a new encoding architecture is presented, where transform matrices are estimated backward-adaptively. This framework allows a more frequent usage of optimal SVD, thereby approaching the full potential of frequency
domain SVD. Also the division of HOA data into predominant and ambient components in current schemes, is difficult to perceptually optimize and ignores spatial inter channel masking effects. To address this issues, a new encoding framework for compression of HOA data is presented, where a null-space basis vector extension technique enables all compression to be performed in the SVD domain, and a jointly computed common masking threshold accounts for effects of spatial masking across components.
The second part is concerned with developing optimization techniques for error-resilient predictive compression systems design. Prediction is used in virtually all compression systems and when such a compressed signal is transmitted over unreliable networks, packet losses can lead to significant error propagation through the prediction loop. Despite this, the conventional design technique completely ignores the effect of packet losses, and estimates the prediction parameters to minimize the mean squared prediction error, and optimizes the quantizer to minimize the reconstruction error at the encoder. While some design techniques have been proposed to
accurately estimate and minimize the end-to-end distortion (EED) at the decoder
that accounts for packet losses, they operate in a closed-loop, which introduces a mismatch between statistics used for design and statistics used in operation, causing a negative impact on convergence
and stability of the design procedure. The first contribution of the dissertation is this part is proposing an effective technique for designing a compression system with a first order linear predictor, that accounts for the instability caused by error propagation due to packet losses, and enjoys stable statistics during design by employing open-loop iterations that on convergence mimic closed loop operation.
End-to-end distortion (EED) estimation, accounting for error propagation
and concealment at the decoder, has been originally developed for video coding, and enables optimal rate-distortion (RD) decisions at the encoder. However, this approach was limited to the video coder’s
simple setting of a single tap constant coefficient temporal predictor. This thesis considerably generalized the framework to account for: i) high order prediction filters, and ii) filter adaptation to local
signal statistics. We demonstrate how this EED estimate
can be leveraged, by an encoder with short and long term linear
prediction, to improve RD decisions and achieve major performance gains. The approach is further extended to estimate EED in speech coders. The error propagation problem is exacerbated in this case, as standard coders not only predict the signal from past frames, but also the parameters (in the line spectral frequency domain) employed for such prediction. Hence, the prediction loop propagates errors in the reconstructed signal as well as errors in the prediction parameters. A recursive algorithm is proposed to estimate, at the encoder, the overall EED, by the subterfuge of parallel tracking of decoder statistics for prediction parameters and signal reconstructions, in their respective domains, which are then combined to obtain the ultimate EED estimate.